New analysis displays that the way in which AI services and products invoice by way of tokens hides the true charge from customers. Suppliers can quietly inflate fees by way of fudging token counts or slipping in hidden steps. Some techniques run further processes that don’t impact the output however nonetheless display up at the invoice. Auditing gear had been proposed, however with out actual oversight, customers are left paying for greater than they notice.
In just about all circumstances, what we as shoppers pay for AI-powered chat interfaces, reminiscent of ChatGPT-4o, is lately measured in tokens: invisible gadgets of textual content that pass overlooked all through use, but are counted with precise precision for billing functions; and despite the fact that each and every change is priced by way of the collection of tokens processed, the consumer has no direct strategy to ascertain the depend.
Regardless of our (at easiest) imperfect working out of what we get for our bought ‘token’ unit, token-based billing has grow to be the usual way throughout suppliers, resting on what would possibly turn out to be a precarious assumption of accept as true with.
Token Phrases
A token isn’t fairly the similar as a note, despite the fact that it frequently performs a an identical position, and maximum suppliers use the time period ‘token’ to explain small gadgets of textual content reminiscent of phrases, punctuation marks, or word-fragments. The note ‘implausible’, for instance, could be counted as a unmarried token by way of one device, whilst any other would possibly cut up it into un, believ and in a position, with each and every piece expanding the fee.
The program applies to each the textual content a consumer inputs and the style’s answer, with the fee in accordance with the full collection of those gadgets.
The trouble lies in the truth that customers don’t get to peer this procedure. Maximum interfaces don’t display token counts whilst a dialog is going on, and the way in which tokens are calculated is difficult to breed. Despite the fact that a depend is proven after a answer, it’s too past due to inform whether or not it was once truthful, making a mismatch between what the consumer sees and what they’re paying for.
Contemporary analysis issues to deeper issues: one study displays how suppliers can overcharge with out ever breaking the foundations, just by inflating token counts in ways in which the consumer can not see; another finds the mismatch between what interfaces show and what’s if truth be told billed, leaving customers with the appearance of performance the place there could also be none; and a third exposes how fashions robotically generate interior reasoning steps which are by no means proven to the consumer, but nonetheless seem at the bill.
The findings depict a device that turns out exact, with precise numbers implying readability, but whose underlying good judgment stays hidden. Whether or not that is by way of design, or a structural flaw, the end result is identical: customers pay for greater than they may be able to see, and frequently greater than they be expecting.
Inexpensive by way of the Dozen?
Within the first of those papers – titled Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives, from 4 researchers on the Max Planck Institute for Tool Programs – the authors argue that the dangers of token-based billing prolong past opacity, pointing to a integrated incentive for suppliers to inflate token counts:
‘The core of the issue lies in the truth that the tokenization of a string isn’t distinctive. For instance, believe that the consumer submits the urged “The place does the following NeurIPS happen?” to the supplier, the supplier feeds it into an LLM, and the style generates the output “|San| Diego|” consisting of 2 tokens.
‘Because the consumer is oblivious to the generative procedure, a self-serving supplier has the capability to misreport the tokenization of the output to the consumer with out even converting the underlying string. As an example, the supplier may just merely percentage the tokenization “|S|a|n| |D|i|e|g|o|” and overcharge the consumer for 9 tokens as a substitute of 2!’
The paper items a heuristic in a position to acting this sort of disingenuous calculation with out changing visual output, and with out violating plausibility below standard interpreting settings. Examined on fashions from the LLaMA, Mistral and Gemma sequence, the usage of actual activates, the process achieves measurable overcharges with out showing anomalous:

Token inflation the usage of ‘believable misreporting’. Each and every panel displays the share of overcharged tokens on account of a supplier making use of Set of rules 1 to outputs from 400 LMSYS activates, below various sampling parameters (m and p). All outputs had been generated at temperature 1.3, with 5 repetitions in keeping with atmosphere to calculate 90% self belief periods. Supply: https://arxiv.org/pdf/2505.21627
To deal with the issue, the researchers name for billing in accordance with persona depend somewhat than tokens, arguing that that is the one way that provides suppliers a explanation why to file utilization truthfully, and contending that if the purpose is truthful pricing, then tying charge to visual characters, no longer hidden processes, is the best choice that stands as much as scrutiny. Persona-based pricing, they argue, would take away the purpose to misreport whilst additionally rewarding shorter, extra environment friendly outputs.
Right here there are a selection of additional concerns, alternatively (generally conceded by way of the authors). At the start, the character-based scheme proposed introduces further trade good judgment that can desire the seller over the shopper:
‘[A] supplier that by no means misreports has a transparent incentive to generate the shortest imaginable output token series, and toughen present tokenization algorithms reminiscent of BPE, in order that they compress the output token series up to imaginable’
The constructive motif this is that the seller is thus inspired to provide concise and extra significant and treasured output. In observe, there are patently much less virtuous tactics for a supplier to scale back text-count.
Secondly, it’s affordable to suppose, the authors state, that businesses would most probably require regulation to be able to transit from the arcane token device to a clearer, text-based billing approach. Down the road, an rebel startup would possibly come to a decision to distinguish their product by way of launching it with this sort of pricing style; however any individual with a in reality aggressive product (and running at a decrease scale than EEE category) is disincentivized to try this.
After all, larcenous algorithms such because the authors suggest would include their very own computational charge; if the expense of calculating an ‘upcharge’ exceeded the possible benefit get advantages, the scheme would obviously haven’t any advantage. Then again the researchers emphasize that their proposed set of rules is efficacious and economical.
The authors give you the code for his or her theories at GitHub.
The Transfer
The second one paper – titled Invisible Tokens, Visual Expenses: The Pressing Want to Audit Hidden Operations in Opaque LLM Services and products, from researchers at the College of Maryland and Berkeley – argues that misaligned incentives in industrial language style APIs aren’t restricted to token splitting, however prolong to complete categories of hidden operations.
Those come with interior style calls, speculative reasoning, software utilization, and multi-agent interactions – all of that could be billed to the consumer with out visibility or recourse.

Pricing and transparency of reasoning LLM APIs throughout main suppliers. All indexed services and products price customers for hidden interior reasoning tokens, and none make those tokens visual at runtime. Prices range considerably, with OpenAI’s o1-pro style charging ten occasions extra in keeping with million tokens than Claude Opus 4 or Gemini 2.5 Professional, in spite of equivalent opacity. Supply: https://www.arxiv.org/pdf/2505.18471
In contrast to typical billing, the place the amount and high quality of services and products are verifiable, the authors contend that as of late’s LLM platforms perform below structural opacity: customers are charged in accordance with reported token and API utilization, however haven’t any method to verify that those metrics replicate actual or important paintings.
The paper identifies two key types of manipulation: amount inflation, the place the collection of tokens or calls is higher with out consumer get advantages; and high quality downgrade, the place lower-performing fashions or gear are silently used rather than top class parts:
‘In reasoning LLM APIs, suppliers frequently take care of a couple of variants of the similar style circle of relatives, differing in capability, coaching knowledge, or optimization technique (e.g., ChatGPT o1, o3). Fashion downgrade refers back to the silent substitution of lower-cost fashions, which would possibly introduce misalignment between anticipated and exact carrier high quality.
‘For instance, a urged could also be processed by way of a smaller-sized style, whilst billing stays unchanged. This tradition is tricky for customers to stumble on, as the overall resolution would possibly nonetheless seem believable for plenty of duties.’
The paper paperwork cases the place greater than 90 % of billed tokens had been by no means proven to customers, with interior reasoning inflating token utilization by way of an element more than twenty. Justified or no longer, the opacity of those steps denies customers any foundation for comparing their relevance or legitimacy.
In agentic techniques, the opacity will increase, as interior exchanges between AI brokers can each and every incur fees with out meaningfully affecting the overall output:
‘Past interior reasoning, brokers keep in touch by way of exchanging activates, summaries, and making plans directions. Each and every agent each translates inputs from others and generates outputs to lead the workflow. Those inter-agent messages would possibly devour really extensive tokens, which might be frequently indirectly visual to finish customers.
‘All tokens fed on all through agent coordination, together with generated activates, responses, and tool-related directions, are usually no longer surfaced to the consumer. When the brokers themselves use reasoning fashions, billing turns into much more opaque’
To confront those problems, the authors suggest a layered auditing framework involving cryptographic proofs of interior task, verifiable markers of style or software identification, and impartial oversight. The underlying worry, alternatively, is structural: present LLM billing schemes rely on a continual asymmetry of data, leaving customers uncovered to prices that they can not check or damage down.
Counting the Invisible
The general paper, from researchers on the College of Maryland, re-frames the billing drawback no longer as a query of misuse or misreporting, however of construction. The paper – titled CoIn: Counting the Invisible Reasoning Tokens in Industrial Opaque LLM APIs, and from ten researchers on the College of Maryland – observes that the majority industrial LLM services and products now disguise the intermediate reasoning that contributes to a style’s ultimate resolution, but nonetheless price for the ones tokens.
The paper asserts that this creates an unobservable billing floor the place complete sequences will also be fabricated, injected, or inflated with out detection*:
‘[This] invisibility lets in suppliers to misreport token counts or inject cheap, fabricated reasoning tokens to artificially inflate token counts. We seek advice from this custom as token depend inflation.
‘As an example, a unmarried high-efficiency ARC-AGI run by way of OpenAI’s o3 style fed on 111 million tokens, costing $66,772.3 Given this scale, even small manipulations may end up in really extensive monetary have an effect on.
‘Such data asymmetry lets in AI corporations to seriously overcharge customers, thereby undermining their pursuits.’
To counter this asymmetry, the authors suggest CoIn, a third-party auditing device designed to make sure hidden tokens with out revealing their contents, and which makes use of hashed fingerprints and semantic tests to identify indicators of inflation.

Evaluate of the CoIn auditing device for opaque industrial LLMs. Panel A displays how reasoning token embeddings are hashed right into a Merkle tree for token depend verification with out revealing token contents. Panel B illustrates semantic validity tests, the place light-weight neural networks examine reasoning blocks to the overall resolution. In combination, those parts permit third-party auditors to stumble on hidden token inflation whilst maintaining the confidentiality of proprietary style habits. Supply: https://arxiv.org/pdf/2505.13778
One part verifies token counts cryptographically the usage of a Merkle tree; the opposite assesses the relevance of the hidden content material by way of evaluating it to the solution embedding. This permits auditors to stumble on padding or irrelevance – indicators that tokens are being inserted merely to hike up the invoice.
When deployed in assessments, CoIn completed a detection good fortune price of just about 95% for some types of inflation, with minimum publicity of the underlying knowledge. Despite the fact that the device nonetheless relies on voluntary cooperation from suppliers, and has restricted solution in edge circumstances, its broader level is unmistakable: the very structure of present LLM billing assumes an honesty that can not be verified.
Conclusion
But even so the benefit of gaining pre-payment from customers, a scrip-based forex (such because the ‘buzz’ device at CivitAI) is helping to summary customers clear of the real worth of the forex they’re spending, or the commodity they’re purchasing. Likewise, giving a dealer leeway to outline their own units of measurement additional leaves the shopper at midnight about what they’re if truth be told spending, relating to actual cash.
Just like the lack of clocks in Las Vegas, measures of this sort are frequently geared toward making the shopper reckless or detached to price.
The scarcely-understood token, which will also be fed on and outlined in such a lot of tactics, is in all probability no longer an acceptable unit of dimension for LLM intake – no longer least as a result of it might probably cost many times more tokens to calculate a poorer LLM lead to a non-English language, in comparison to an English-based consultation.
Then again, character-based output, as recommended by way of the Max Planck researchers, would most probably desire extra concise languages and penalize naturally verbose languages. Since visible indications reminiscent of a depreciating token counter would most likely make us a bit extra spendthrift in our LLM classes, it sort of feels not likely that such helpful GUI additions are coming anytime quickly – no less than with out legislative motion.
* Authors’ emphases. My conversion of the authors’ inline citations to links.
First printed Thursday, Would possibly 29, 2025
Source link