Michael Burry is back in the headlines with a new “Big Short” style call. This time he is not targeting subprime mortgages. He is targeting the AI boom itself, and specifically the way tech giants depreciate their GPU fleets. He argues that hyperscalers like Meta and Oracle and even neoclouds like CoreWeave are stretching the useful life of AI hardware far beyond reality, hiding as much as 176 billion dollars of future expenses and inflating reported earnings along the way.

It is a clean, scary narrative. Expensive chips, aggressive accounting, short shelf life, big crash. The problem is that it treats AI infrastructure as if it were just another tech bubble spreadsheet, instead of a living system that is being reshaped by the age of inference and agentic AI. Once you look at how these chips are actually used, Burry’s thesis stops looking like a brave contrarian insight and starts looking like a static snapshot of a very dynamic machine.

What Burry Thinks Is Hiding In The Numbers

Burry’s core claim is simple. Hyperscalers are extending the depreciable lives of AI hardware from something like 3 to 4 years out to 6, 7, even 8 years. By spreading the cost over a longer period, they are reducing annual depreciation, which mechanically boosts operating income. If the hardware is really only economically useful for, say, 3 to 5 years, then those earnings are overstated and will eventually be corrected in a nasty way.

He points to companies like Oracle and Meta, which have disclosed changes in depreciation schedules, and to analysis suggesting this could inflate cumulative profits by over 20 percent later this decade. Commentators have picked up the thread and extended it. Bloomberg, for example, notes that Nvidia’s GPUs are costly, short‑lived assets with an estimated shelf life of “perhaps five years” just as Big Tech is pushing useful lives to six. Others warn that mis-estimated lives could become a “depreciation crisis” for both hyperscalers and “neoclouds” like CoreWeave.

On the surface, it sounds damning. But the way depreciation actually works, and the way AI systems are evolving, tells a very different story.

How Depreciation Actually Works When Reality Changes

Depreciation is not a moral statement. It is an estimate. Accounting standards like US GAAP explicitly treat useful life as a management estimate that must be updated when new information arrives. Guidance on changes in estimates uses the useful life of property and equipment as the textbook example.

If a company discovers that its servers and networking gear are lasting longer, staying more reliable, or being kept fully utilized for more years than expected, it is not just allowed to change the depreciation schedule. It is expected to. Those changes are accounted for prospectively. Past financials are not restated. The new, better estimate is applied going forward.

That is exactly what has happened at the hyperscalers. Alphabet completed a lifecycle study and concluded its servers and some networking gear could be run for six years instead of the three to four that used to be standard. It told investors this would cut 2023 depreciation by roughly 3.4 to 3.9 billion dollars and boost net income accordingly. Meta and Microsoft have made similar moves, lengthening server lives to roughly five and six years.

You can think this turns the financial screws in their favor. You can also recognize that it is exactly what the rule book says to do when reality changes. The key question is not “did they extend lives?” but “do those new lives match what is actually happening in the data centers?”

The Age Of Inference Turns “Old” GPUs Into Workhorses

Burry’s view implicitly assumes that once a new chip generation arrives, older GPUs slide into irrelevance. That might have been a decent approximation in a world dominated by a few big training runs. It makes far less sense in the world we are moving into now, where inference and agents soak up compute continuously.

Inference is not a niche side effect any more. Hyperscalers and analysts expect inference workloads to dominate AI compute demand as models are deployed into products, autonomous agents, copilots and background workflows that run for minutes or hours per user session. Google explicitly describes this shift as the “age of inference” and is now shipping dedicated inference TPUs like Ironwood to power long‑running “thinking models” and agentic systems at scale.

In that world, the job is less about doing a few massive training runs on the absolute latest chip and more about serving billions of token generations, queries and agent steps every single day. You do not need the leading‑edge GPU for every one of those tasks. Many inference jobs, especially for smaller or quantized models, are perfectly happy to run on hardware that is a generation or two behind, as long as it is cheap and available. That is exactly the kind of workload that keeps “old” GPUs useful for much longer than a simple resale‑price chart would suggest.

As agentic systems spread, that effect compounds. A simple chat request might have been a few hundred milliseconds of compute on a cutting‑edge GPU. A complex agent that plans, calls tools, retrieves data, and iterates on a task can run for minutes, with many more tokens and far more decision steps. That is a massive shift in inference demand, and it creates an enormous amount of work for any chip that is still good enough to serve those requests at reasonable latency.

(Source: Nvidia)

CoreWeave’s leadership recently asserted that Nvidia A100 GPUs, despite being introduced in 2020, are still in high demand and fully booked for customers as of late 2025. CEO Michael Intrator specifically told CNBC that the company’s A100 inventory continues to be fully utilized and argued that the infrastructure retains value, defending their practice of depreciating these AI chips over a six-year period.

Evidence From Google: Eight Year Old TPUs At Full Throttle

We do not have to guess about whether older accelerators can stay useful. Google has already told us. In an October 2025 talk, Amin Vahdat, Google’s VP for AI and infrastructure, said that Google currently has seven generations of TPUs in production and that its seven and eight year old TPU generations are running at 100 percent utilization.

That is a direct, real‑world data point that flatly contradicts the idea that AI accelerators are economically dead after three years. These are not museum pieces sitting in a corner. They are fully booked. In fact, Vahdat said that TPU demand is so strong that Google is turning customers away, even while those oldest TPUs remain in service.

When a company can keep eight year old accelerators running flat out, extending their accounting life from three or four years to six looks less like “cooking the books” and more like aligning the books with reality.

GPU Lifespan Is Economic, Not Just Technological

Critics of the AI trade point to the brutal price decline of older GPUs. One analysis of CoreWeave’s IPO noted that Nvidia’s 2019 V100 GPU, which once cost around ten thousand dollars, can now be bought for a few hundred dollars and is several generations behind current architectures. The piece called six years of useful life “an eternity” in AI.

That kind of observation is factually correct and still misses the point. There is a difference between the resale price of a used GPU and its economic value to an operator with a mountain of inference demand. A cheap, fully depreciated chip that can sit in a rack and serve non‑frontier workloads at high utilization is extraordinarily valuable, even if secondary market prices have collapsed.

This is especially true for hyperscalers with massive internal workloads. They can tier their fleets: latest‑gen chips for cutting‑edge training and the most demanding inferences, slightly older chips for mainstream inference, and still older ones for background tasks, experimentation, or long tail internal uses. That entire “compute waterfall” increases the effective life of the asset, even while the market is obsessing over the next shiny architecture.

Depreciation Schedules Are Adapting Both Ways

Another piece that undercuts Burry’s story is what happened at Amazon. For several years, hyperscalers extended the estimated lives of servers to about six years after internal studies showed they could safely run hardware longer and squeeze more value out of it.

In early 2025, however, Amazon Web Services did the opposite for part of its fleet. On an earnings call, AWS said the pace of AI and machine learning innovation had increased and it therefore shortened the useful life of some servers and networking gear from six years to five, which reduced operating income by hundreds of millions of dollars.

In other words, the knob turns both directions. When hardware is clearly lasting longer and staying productive, useful life estimates go up. When new architectures like Blackwell arrive and make some portions of the fleet relatively less attractive, estimates can come back down. That is exactly how the system is supposed to work. It is not evidence of a one‑way conspiracy. It is evidence that management teams are reacting to technology and demand signals as they go.

Inference Demand Is A Tailwind For Longer Lives

If you zoom out from quarter‑to‑quarter accounting, the structural story is simple. The world is building out AI factories that will run generative models around the clock. Inference demand is growing with user counts, token volumes, and the rise of agentic workflows that keep models busy for sustained periods instead of just answering one short prompt.

Every time a company wires an AI copilot into a product, launches an internal agent to help employees, or automates a workflow, it is effectively signing up for more inference compute. That demand is not anchored to only the very newest chips. It spills down through the entire stack. As long as a GPU or TPU can do useful work at an acceptable cost per token, there is a strong incentive to keep it powered, cooled and connected.

That incentive is exactly what supports longer useful lives. If the physical hardware remains reliable, the software stack keeps improving, and the workload mix broadens, then the economic life of the asset extends. The books should reflect that. Pretending that anything older than three years is worthless would be far more misleading.

(Source: METR)

Inference demand is positioned to surge as AI agents continuously expand the duration of their autonomous work, effectively doubling their active compute needs each cycle.

Where The Real Risk Actually Is

None of this means depreciation risk is imaginary. There are real scenarios where Burry’s instincts could prove partly right. If AI demand disappoints, agentic use cases stall, or a radically more efficient architecture makes even mid‑life chips uneconomic to power compared with alternatives, then useful life estimates will need to be cut and depreciation front‑loaded. Analysts already worry this is a particular risk for GPU‑rental specialists like CoreWeave, which live or die by the speed at which they can amortize their fleets.

But that is a very different claim than “there is 176 billion dollars of guaranteed artificial profit waiting to be reversed.” The real risk sits in operator‑specific choices about architecture, product strategy and pricing. It is not a universal, mechanical time bomb. Some firms will manage the shift with multi‑tier fleets and software optimization. Others may misjudge and take a hit. That is how competitive markets usually work.

The Bottom Line: A Static Thesis In A Dynamic System

Burry’s argument resonates because it rhymes with his greatest hit. Expensive assets, optimistic assumptions, a crowd of believers and a grim arithmetic lurking underneath. The problem is that AI hardware today is not like the misrated mortgage tranches of 2006. It lives in a fast evolving ecosystem where inference demand is exploding, agentic workloads are coming online, and older accelerators are still running flat out eight years after deployment.

When you take that reality seriously, extending useful lives from three or four years to something closer to six looks less like deception and more like updating the model to match how the assets are actually used. Depreciation is supposed to follow economic life. In AI, economic life is being stretched by software progress and the rise of inference, not cut short. That is why, on the specific issue of GPU depreciation, Michael Burry is not uncovering a hidden trap. He is fighting the last war while the infrastructure of the next one is already humming along on “old” chips that are busier than ever.

Keep reading