● LIVE
Advertise on The Central Bulletin  →  View media kit

What Is AI Inference and Why It Costs More Than Training

Mohana Priya By Mohana Priya
10 Min Read

Key Highlights

  • AI inference costs have surged to represent over 70% of total AI compute spending as of Q3 2023
  • Google’s TPU v4 chips have cut inference costs by 30% compared to previous models, announced in March 2023
  • Amazon’s Inferentia chips reduced inference latency by 25% in real world applications, according to a report from June 2023
  • Market analysts project a 40% decrease in inference costs by 2025, considerably impacting AI product pricing

AI inference is quickly becoming the heavyweight in the world of artificial intelligence. It now accounts for a staggering 70% of total AI compute spending, leaving the process of training far behind. But why does inference come with a higher price tag?

To understand this, we need to break down the economics of AI training versus inference, the role of custom chips, and what the falling costs of inference could mean for the future of AI products.

Understanding AI Inference and Training

At its core, AI inference is the process through which a trained AI model makes predictions based on new data. Think of it as the application phase, where the model uses what it learned during training to provide outputs.

On the flip side, training is the phase where the model learns from a dataset, adjusting its parameters to improve accuracy. Training involves massive datasets and extensive computational resources, which is why it traditionally dominated AI spending. However, as AI models become more widely deployed in real world applications, inference is now taking center stage.

Companies like Google and Amazon are investing heavily in optimizing inference processes, and the financial implications are significant.

The Economics of Inference vs. Training

The economics of AI inference versus training boils down to resource allocation and cost efficiency. Training requires a one time investment in extensive computational power, while inference incurs ongoing costs each time the model is applied.

According to a report by DefiLlama, companies are now allocating more resources to inference as they deploy AI solutions at scale. As a result, AI inference costs are soaring. In Q3 2023, they represented over 70% of total AI compute spending.

This shift in economics signals a need for businesses to optimize their infrastructure for inference, rather than solely focusing on training.

Why Inference Costs More

So, why does inference come with higher costs? The answer lies in the complexity of real time data processing. Inference requires instant outputs, which means the computational resources need to be not just powerful but also incredibly efficient. This contrasts with training, where the model can be optimized over time.

Also, the demand for high availability and low latency drives up costs. Businesses need their AI systems to deliver results almost instantaneously, particularly in sectors like finance and healthcare. This urgency leads to investments in specialized hardware and software solutions that can support rapid inference.

GPU vs. Custom Chips: The Hardware market

When it comes to hardware, GPUs have long been the go to choice for AI training. They’re powerful and versatile, capable of handling complex computations. However, for inference, custom chips like Google’s Tensor Processing Units (TPUs), Amazon’s Inferentia, and Groq’s specialized processors are gaining traction.

Google’s TPU v4 chips, for instance, have shown a 30% reduction in inference costs compared to their predecessors. Amazon’s Inferentia chips have demonstrated a 25% reduction in latency, making them appealing for companies looking to optimize their AI applications.

These custom solutions are designed specifically to meet the demands of inference processing, making them more efficient than general purpose GPUs. But do GPUs have a future in inference? The answer isn’t straightforward. While they’re still widely used, the trend favors custom chips that can provide better performance at lower costs.

As companies like Google and Amazon continue to innovate in this space, we may see a broader shift toward these specialized processors.

The Impact of Falling Inference Costs

Recent trends suggest that inference costs are set to decline notably, with market analysts projecting a 40% decrease by 2025. This could have far reaching implications for the pricing of AI products. As the cost of running AI models decreases, companies can pass those savings on to consumers.

Lower inference costs may lead to a surge in AI product offerings, making advanced AI technologies accessible to a broader audience. This democratization of AI could spark innovation across various sectors, from healthcare to finance. However, with these opportunities come risks.

Companies that rely heavily on inference models must adapt quickly or risk being left behind. Those who can optimize for the new economic space will emerge as leaders, while others may struggle to stay afloat.

Frequently Asked Questions (FAQs)

What is AI inference and why is it important?

AI inference is the process where a trained AI model makes predictions based on new data, acting as the application phase of AI. It has become crucial as it now represents over 70% of total AI compute spending.

Why does AI inference cost more than training?

AI inference costs more than training because it requires extensive computational resources and custom chips designed for efficiency. As a result, it has surged in spending, overshadowing the training phase.

How have recent advancements in chips affected inference costs?

Recent advancements, like Google’s TPU v4 chips and Amazon’s Inferentia chips, have significantly reduced inference costs and latency, making the process more efficient. For example, Google’s chips cut inference costs by 30%.

What does the future hold for AI inference costs?

Market analysts project a 40% decrease in inference costs by 2025, which could have a considerable impact on AI product pricing and accessibility.

The TCB View

TCB believes AI inference will continue to dominate compute spending, with costs projected to decrease by 40% by 2025. This presents both opportunities and challenges. Companies that adapt to lower costs will thrive, while those that don’t may falter. Watch for innovations in custom chips that could reshape the efficiency of AI applications.


Free Daily Newsletter

The Daily Brief

What's moving crypto, AI and markets, explained in 5 minutes. Every weekday morning.

Free weekday newsletter  ·  No spam, ever  ·  Unsubscribe anytime

Share This Article
Follow:
Mohana Priya is a staff reporter at The Central Bulletin specialising in crypto regulation, DeFi policy, stablecoin legislation, and Web3 legal frameworks. She has tracked legislative developments across the United States, the European Union, and Asia Pacific, covering the GENIUS Act, the Crypto Clarity Act, MiCA implementation, and SEC enforcement actions against digital asset issuers. Her reporting focuses on translating complex regulatory language into clear, actionable analysis for institutional readers, compliance professionals, and retail investors navigating an evolving legal landscape. She monitors primary sources including Congressional filings, SEC and CFTC dockets, and official EU regulatory publications. Her work appears exclusively at The Central Bulletin.