Attend our Next FinOps Live Event | 26 February
Attend our Next FinOps Live Event | 26 February
A Comprehensive Guide to FinOps for AI
As artificial intelligence adoption accelerates, many organizations find themselves facing a new and daunting challenge: surging AI cloud costs that seem to spiral out of control.
During a recent session at the FinOps Weekly Summit, Gil Rozen, Co-founder and CTO of Wiv.ai, shared his firsthand experience on how to effectively manage cloud and AI costs—without slowing down innovation or business growth.
The Rising Crisis of AI Spending
The scale of the problem is significant. Reports indicate that average monthly AI budgets are expected to rise by 36% in 2025. Despite this massive investment, only 51% of teams feel confident in their ability to measure AI ROI (Return on Investment). This gap exists because many features are implemented without a clear understanding of the underlying cost drivers or how to measure their business value.
Understanding the Mechanics of AI Pricing
To control costs, organizations must first understand the pricing structures of services like AWS Bedrock. Rozen breaks down the base formula as:
Total Cost = (Input Tokens × Model Rate) + (Output Tokens × Model Rate)
Key observations include:
Output tokens are generally more expensive than input tokens, making it vital to structure model responses efficiently.
Costs vary wildly between models; for instance, the price difference between Anthropic’s Claude Sonnet and Amazon Nova Pro is significant.
Additional Factors: Model type, storage, data transfer, and regional pricing all contribute to the final bill.
What Drives AI Costs Out of Control?
Rozen identifies two categories of drivers—organizational and technical—that lead to runaway spending.
Organizational Drivers
AI Hype and Overuse: The pressure to add AI everywhere, often treating it as a “silver bullet” for problems that could be solved with simpler code.
Uncontrolled MVPs: AI experiments scale faster than traditional software. A Proof of Concept (POC) that seems affordable in dev can “blast” in production without proper cost planning.
Shadow AI: Developers using AI cloud resources without direct scope or monitoring.
Technical Drivers
-
Token Explosion: Caused by long prompts or the anti-pattern of accumulated conversation history, where the entire chat history is resent with every new message.
-
Loops and Retries: AI agents can enter infinite loops or retry failed tools repeatedly, consuming tokens with every attempt.
-
Strongest Model Bias: Defaulting to the most powerful (and expensive) model for simple tasks like text summarization.
A Unified FinOps & Engineering Strategy
Rozen advocates for a dual approach that integrates FinOps early into the engineering lifecycle.
The Engineering Layer: Optimization at the Code Level
Code-First Approach: Before using an LLM, determine if the problem can be solved with simple code logic, which is faster and cheaper.
Prompt Engineering: Use prompt caching to save static parts of instructions or context, reducing both token costs and latency.
Right-Sizing Models: Avoid using the largest model for everything. Implement dynamic model routing to pick small, medium, or large models based on task complexity or customer tier.
Visibility and Guardrails
Granular Attribution: Use tools like AWS Inference Profiles to map spend to specific apps, teams, or tenants.
AI Efficiency Ratio: Measure cost against business metrics (e.g., AI interaction cost per feature) to determine if the implementation is actually effective.
Automated Guardrails: Implement rate limits, max token limits, and anomaly detection to catch runaway spending immediately.
Asking the Right Questions
The most critical step in managing AI costs is asking the right questions during the development phase, rather than waiting for the production bill. By combining early engineering intervention with robust FinOps automation, companies can ensure their AI initiatives provide real business value rather than just high cloud invoices.
As Rozen emphasizes, the cheapest AI call is the one you don’t need to make. Moving forward, the goal for any organization should be a controlled environment where experiments are measured and scaled responsibly.