What Should Frontier-Model Access Cost Your Engineering Team?
The question I get from CFOs is "what is the AI bill going to be," and the honest first answer is that most of the scary numbers come from modelling people as if they were pipelines. The cheapest access route for an individual (a free tier, a single subscription) is rarely the right one for a team, and the most expensive line is never the one on the licence quote. Here is how the cost actually decomposes, and where the false economies hide.
Cost per active engineer is the only number that matters
Frontier-model access has exactly two billing shapes: a flat seat (a coding-assistant subscription, a paid chat plan) or metered tokens (the API). For the interactive, human-in-the-loop work that fills most of an engineer's day, the seat is cheaper, and it is not close. A person typing prompts cannot physically consume enough tokens to beat a flat monthly fee — the seat is priced on the assumption that you will try and fail. When a team's AI budget looks alarming, it is almost always because someone modelled every engineer at automated-pipeline token rates. People are not pipelines.
So the number to govern is cost per active engineer per month, and the baseline is a premium coding-assistant seat plus modest API headroom for the spiky work. For the per-tool free-tier limits and the seat-versus-API breakeven in detail, We The Flywheel has the field data: the cheapest-access cluster and the subscription-vs-API breakdown. This page is about what to do with those numbers once you are buying for fifty or five hundred people instead of one.
The false economy of free tiers at org scale
Every provider gives away a free tier, and for an individual evaluating a model it is a genuine gift. For a team, routing real work through free tiers is a false economy, because the bill is paid in three currencies that never appear on an invoice. First, rate limits: an engineer stalled mid-task waiting on a throttled free key costs more in salary-minutes than the seat would have. Second, no SLA: when the provider throttles under load, your team's productivity is a function of someone else's free-tier capacity planning. Third, and most serious, data: most free tiers reserve the right to train on your prompts and outputs. At org scale that means proprietary code and customer data flowing into a training set no one signed off on — a governance and IP exposure that dwarfs the licence saving.
The rule is simple. Free tiers are for evaluation. Production runs on a paid tier with a data-processing agreement that contractually keeps your data out of training. The saving you think you are capturing with free access is a liability you are quietly taking on.
Standardise the default, allow documented exceptions
The most expensive access pattern I see is not over-paying for the top model; it is sprawl. Every engineer on a different assistant, each with a personal API key expensed individually, produces three problems at once: ungoverned spend you cannot forecast, a dozen different data-handling terms you have never read, and a security surface no one owns. The fix is not to ban choice; it is to make the default the path of least resistance. One negotiated coding-assistant seat for everyone, one shared metered API account for automation, one data-processing agreement, one dashboard for spend. Allow a documented exception process for the engineer whose workflow genuinely needs something else, and most will never invoke it.
The hidden cost is inference at scale, not the seat
When a CFO is surprised by an AI bill, the surprise is almost never the per-seat licence — it is inference at production scale. A capability that looks cheap across ten engineers in a pilot behaves very differently when it is wired into automated workflows calling the model thousands of times a day. That is the line to budget for, alongside the governance overhead (the agreement, the approved-tools list, the spend monitoring) and the organisational debt that tool sprawl leaves behind. Model the scale-up and the governance, not just the seats, and the number stops surprising people.
The decision, in one paragraph
Buy seats for people and a shared metered API account for pipelines. Standardise the default tool, negotiate one data agreement, route cheap work to cheap models. Use free tiers to evaluate, never to serve. Budget for inference scale-up and governance, not just licences. Run the numbers for your own team in the LLM access cost calculator, and for the POV on why the free-quota arbitrage feels cheaper than it is, Tom Prommer's essay on the subsidy is the companion read.