Key Highlights
- AI inference costs have surged to represent over 70% of total AI compute spending as of Q3 2023
- Google’s TPU v4 chips have cut inference costs by 30% compared to previous models, announced in March 2023
- Amazon’s Inferentia chips reduced inference latency by 25% in real world applications, according to a report from June 2023
- Market analysts project a 40% decrease in inference costs by 2025, considerably impacting AI product pricing
AI inference is quickly becoming the heavyweight in the world of artificial intelligence. It now accounts for a staggering 70% of total AI compute spending, leaving the process of training far behind. But why does inference come with a higher price tag?
To understand this, we need to break down the economics of AI training versus inference, the role of custom chips, and what the falling costs of inference could mean for the future of AI products.
Understanding AI Inference and Training
At its core, AI inference is the process through which a trained AI model makes predictions based on new data. Think of it as the application phase, where the model uses what it learned during training to provide outputs.
On the flip side, training is the phase where the model learns from a dataset, adjusting its parameters to improve accuracy. Training involves massive datasets and extensive computational resources, which is why it traditionally dominated AI spending. However, as AI models become more widely deployed in real world applications, inference is now taking center stage.
Companies like Google and Amazon are investing heavily in optimizing inference processes, and the financial implications are significant.
The Economics of Inference vs. Training
The economics of AI inference versus training boils down to resource allocation and cost efficiency. Training requires a one time investment in extensive computational power, while inference incurs ongoing costs each time the model is applied.
According to a report by DefiLlama, companies are now allocating more resources to inference as they deploy AI solutions at scale. As a result, AI inference costs are soaring. In Q3 2023, they represented over 70% of total AI compute spending.
This shift in economics signals a need for businesses to optimize their infrastructure for inference, rather than solely focusing on training.
Why Inference Costs More
So, why does inference come with higher costs? The answer lies in the complexity of real time data processing. Inference requires instant outputs, which means the computational resources need to be not just powerful but also incredibly efficient. This contrasts with training, where the model can be optimized over time.
Also, the demand for high availability and low latency drives up costs. Businesses need their AI systems to deliver results almost instantaneously, particularly in sectors like finance and healthcare. This urgency leads to investments in specialized hardware and software solutions that can support rapid inference.
GPU vs. Custom Chips: The Hardware market
When it comes to hardware, GPUs have long been the go to choice for AI training. They’re powerful and versatile, capable of handling complex computations. However, for inference, custom chips like Google’s Tensor Processing Units (TPUs), Amazon’s Inferentia, and Groq’s specialized processors are gaining traction.
Google’s TPU v4 chips, for instance, have shown a 30% reduction in inference costs compared to their predecessors. Amazon’s Inferentia chips have demonstrated a 25% reduction in latency, making them appealing for companies looking to optimize their AI applications.
These custom solutions are designed specifically to meet the demands of inference processing, making them more efficient than general purpose GPUs. But do GPUs have a future in inference? The answer isn’t straightforward. While they’re still widely used, the trend favors custom chips that can provide better performance at lower costs.
As companies like Google and Amazon continue to innovate in this space, we may see a broader shift toward these specialized processors.
The Impact of Falling Inference Costs
Recent trends suggest that inference costs are set to decline notably, with market analysts projecting a 40% decrease by 2025. This could have far reaching implications for the pricing of AI products. As the cost of running AI models decreases, companies can pass those savings on to consumers.
Lower inference costs may lead to a surge in AI product offerings, making advanced AI technologies accessible to a broader audience. This democratization of AI could spark innovation across various sectors, from healthcare to finance. However, with these opportunities come risks.
Companies that rely heavily on inference models must adapt quickly or risk being left behind. Those who can optimize for the new economic space will emerge as leaders, while others may struggle to stay afloat.
Frequently Asked Questions (FAQs)
What is AI inference and why is it important?
AI inference is the process where a trained AI model makes predictions based on new data, acting as the application phase of AI. It has become crucial as it now represents over 70% of total AI compute spending.
Why does AI inference cost more than training?
AI inference costs more than training because it requires extensive computational resources and custom chips designed for efficiency. As a result, it has surged in spending, overshadowing the training phase.
How have recent advancements in chips affected inference costs?
Recent advancements, like Google’s TPU v4 chips and Amazon’s Inferentia chips, have significantly reduced inference costs and latency, making the process more efficient. For example, Google’s chips cut inference costs by 30%.
What does the future hold for AI inference costs?
Market analysts project a 40% decrease in inference costs by 2025, which could have a considerable impact on AI product pricing and accessibility.
The TCB View
TCB believes AI inference will continue to dominate compute spending, with costs projected to decrease by 40% by 2025. This presents both opportunities and challenges. Companies that adapt to lower costs will thrive, while those that don’t may falter. Watch for innovations in custom chips that could reshape the efficiency of AI applications.

