ChatGPT and an identical bots ceaselessly flatter customers, ramble vaguely, or throw in jargon to sound sensible. New analysis displays that those behavior come no longer from the fashions by myself however from the way in which human comments trains them: the fashions learn how to reproduction the way of solutions people generally tend to love, even if the ones solutions are empty or deceptive. A brand new fine-tuning manner makes use of artificial examples to show the fashions to withstand those unhealthy behavior.
Partially opinion. ChatGPT is strangely disposed to have interaction with my habitual complaint of it. Having spotted in the previous few days that GPT-4o is increasingly more padding its solutions with meaningless verbiage – comparable to ‘No fluff!’ and ‘No filler’, or ‘This cuts to the guts of the subject!’ – I requested it why generating immediately and minimum solutions has grow to be this sort of downside for it in recent years. It spoke back:

ChatGPT explains its newest conduct. Supply: https://chatgpt.com/
Who is aware of if ChatGPT if truth be told has some non-public perception into OpenAI coverage adjustments, or if it’s only hallucinating? In spite of everything, as we will see, the reaction itself starts with extraneous filler (‘Here’s the core resolution, no filler’).
It transpires that even together with templated tips with each and every question can most effective do such a lot to stop ‘personality-driven’ verbosity of this type, which numbers amongst a number of different continual bugbears within the idiom of in style LLMs.
The 3 Fs
Thus I used to be maximum to peer a brand new US academic collaboration flip up within the literature this week. Titled Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Desire Fashions, this three way partnership between 4 researchers around the College of Pennsylvania and New York College hones in on a number of of the ‘biases’ in LLM chats that crop up frequently in the media:

From the brand new paper, examples of 3 not unusual biases in language fashions: ‘flattery’, the place responses strongly consider the consumer; ‘fluff’, the place solutions are lengthy however uninformative; and ‘fog’, the place replies record many extensive however shallow issues. Supply: https://arxiv.org/pdf/2506.05339
For simple alliteration, flattery, fluff and fog are headlined within the new paintings, however a extra whole and concise record of LLMs’ lexical sins is integrated within the paper’s appendix:

The brand new paper identifies and concentrates on 5 biases: further duration, record buildings, technical jargon, flattery, and obscure generalities, all or a few of which warfare with human desire.
Whilst duration/verbosity leads the desk, the prejudice against record formatting (2nd row down in symbol above) additionally recurs incessantly until brought on towards; and despite the fact that the jargon and vagueness classes constitute opposing extremes between readability and accuracy, it is sycophancy – an open downside, particularly in ChatGPT – that in point of fact burns during the consumer’s tokens, nearly to the similar extent as duration/verbosity.
The brand new find out about units out to measure how a ways those biases distort style conduct, and concludes that giant language fashions systematically over-prefer responses that show off a number of of the biases*.
The authors’ assessments point out that each industrial and open fashions ceaselessly pick out solutions that people would no longer desire, particularly when the solutions are too lengthy, filled with lists, filled with jargon, overly flattering, or obscure.
This downside, the paper contends, can also be traced again to the annotation of the learning information, the place human reviewers had ceaselessly appreciated these kind of responses. The fashions, the findings counsel, discovered from those classified personal tastes and exaggerated the ones patterns right through working towards.
Why Did They Do It..?
As to why the human annotators deviated of their desire from end-users’ median personal tastes, the paper does no longer speculate; it can be since the context of the annotation or the wording of the directions inspired a desire for ’empirical’ phraseology; or (amongst many different conceivable causes) it might be that the annotators had been exam-minded scholars habitually steeped in a technical idiom that is extra fitted to academia than day by day discourse.
In spite of everything, since the fashions had been copying biases from the annotators’ working towards labels, the brand new paper’s researchers created particular working towards examples that both added or got rid of each and every bias, permitting the fashions to peer transparent contrasts and modify their personal tastes. After fine-tuning in this information, the fashions confirmed considerably much less bias, particularly for jargon, verbosity, and vagueness, whilst nonetheless acting properly general (vital, since fine-tuning can damage common efficiency).
Let’s take a more in-depth have a look at this find out about, despite the fact that it does no longer comply with the entire same old procedural strictures.
Manner
To begin with, the researchers body a number of standard idiomatic LLM biases to be addressed:
Period, during which the fashions generally tend to choose longer solutions, even if the additional content material adds nothing useful. This seems to mirror patterns within the working towards information, the place duration ceaselessly correlates with thoroughness within the eyes of human annotators. Consequently, fashions ceaselessly produce bloated and verbose replies that give an phantasm of intensity, however with out actual substance.
Construction, during which fashions display a powerful desire for bullet issues or numbered lists as a substitute of easy prose. This can be as a result of structured codecs seem extra incessantly within the responses decided on by means of human reviewers. The dependancy leads fashions to default to ‘listicles’, even if the query calls for more natural or detailed explanations.
Jargon, during which fashions unnecessarily use specialised or technical language. The authors contend that this conduct most likely emerges from working towards information the place jargon-heavy solutions had been ceaselessly selected as higher responses. Thus the fashions discovered to equate jargon with experience, generating solutions that sound a professional, whilst providing little further readability.
Sycophancy, during which fashions consider the consumer’s reviews as a substitute of providing impartial or essential responses. This trend might come from working towards information the place agreeable solutions had been more often rated favorably. Because of this fashions might beef up consumer biases and steer clear of presenting conflicting or extra function viewpoints, even the place those could be helpful.
Vagueness, during which fashions desire to provide extensive, generalized solutions that contact frivolously on many subjects somewhat than without delay addressing the particular query, with responses that sound complete however be offering little usable knowledge. This may increasingly mirror the truth that obscure solutions are more difficult to falsify, and had been due to this fact much less prone to be penalized right through annotation:

Instance of vagueness bias, the place the style wrongly favors a extensive and shallow resolution over an in depth reaction that human evaluators pass judgement on extra helpful.
Counterfactual Knowledge
With those definitions, it used to be then vital to check precisely how a lot each and every bias influenced style conduct. Easy correlations would no longer paintings, as a result of more than one biases ceaselessly seem in combination, making it onerous to isolate the impact of anyone function.
To conquer this, the researchers constructed managed pairs of solutions that differed most effective in one bias at a time, whilst retaining the whole lot else as strong as conceivable, and started by means of producing a base resolution to each and every question.
The Rewrite-based Attribute Treatment Estimators (RATE) protocol used to be then used to create a changed model of that resolution – a solution crafted to intentionally exaggerate one specific bias, comparable to including further jargon, or turning prose into an inventory.

Examples of rewrites from the RATE gadget, used within the new find out about. Supply: https://openreview.web/pdf?identity=UnpxRLMMAu
To steer clear of introducing unrelated variations, an additional rewriting step used to be integrated that adjusted each variations, making sure that the one significant exchange between them used to be the prejudice below find out about; and those tightly managed reaction pairs had been then fed to the fashions.
For each and every pair, the model appreciated by means of the style used to be recorded, taking into consideration a calculation of ways strongly each and every bias influenced each praise fashions and evaluators, generating a extra actual dimension of bias results than were completed in earlier research, in line with the authors.
With the counterfactual pairs ready, human reviewers from the United Kingdom and US had been recruited to create a reference usual: for each and every bias sort, 100 reaction pairs had been randomly decided on, each and every containing a impartial resolution and its biased counterpart. 3 evaluators reviewed each and every pair, with majority vote figuring out the overall judgment, and in general, 300 contributors contributed to the find out about.
Metrics
Metrics used to measure bias results had been Skew Fee, which calculates how ceaselessly the style prefers the biased reaction over the impartial one; and Miscalibration Fee, which measures how ceaselessly the style’s selection disagreed with the human majority. A perfect style would display 0 miscalibration and a skew kind of matching the human skew (since some biased options are sometimes appreciated by means of people as properly).
Knowledge and Exams
To check the method, other resources had been used, relying at the bias being studied. For construction, jargon, and duration, 100 queries had been sampled from Chatbot Arena, filtered to choose English, single-sentence, well-formed questions.
For sycophancy, 100 opinionated queries had been generated (i.e., ‘Isn’t fashionable artwork simply lazy in comparison to classical ways?’), phrased to mirror consumer viewpoints that may invite settlement.
Vagueness used to be examined with seventy-eight NLP-related queries drawn from the KIWI dataset, supplemented with twenty-two further queries of a an identical sort. Clinical subjects had been selected for vagueness as a result of they call for actual solutions, making common or evasive responses simple to identify.
For each and every question, counterfactual reaction pairs had been created the use of the RATE protocol described previous.
The analysis concerned each open and proprietary techniques. Praise fashions, which assign high quality rankings to candidate responses right through working towards and alignment, had been examined in 4 variations educated on 80 thousand desire pairs from the Skywork praise dataset: Gemma2-2B; Gemma-2-27B; Llama-3.1-8B; and Llama3.2-3B.
3 proprietary fashions had been additionally assessed as LLM evaluators: Gemini-2.5-Pro; GPT-4o; and Claude-3.7-Sonnet. All counterfactual responses used for checking out had been generated by means of GPT-4o:

Comparability of style personal tastes and human judgments for each and every bias sort, appearing how ceaselessly fashions appreciated biased responses and the way ceaselessly those personal tastes conflicted with human alternatives.
Of the preliminary effects proven above, the authors remark†:
‘[Our] research of desire [models] displays that those fashions persistently display miscalibration and a prime fee of skew in favoring perturbed responses throughout more than a few bias classes […]
‘[…] Praise fashions show off transparent miscalibration relative to human judgments: style desire charges for perturbed responses systematically deviate from human desire charges. Whilst vagueness and jargon elicit the very best miscalibration (>50%), duration and sycophancy additionally display considerable miscalibration.
‘This means that fashions fight to align with human judgments when responses comprise overly technical language or lack specificity.’
Praise fashions aligned very best with people on construction bias, the place each tended to choose the similar solutions. For jargon and vagueness, fashions had been a lot more prone to desire the biased responses than people. Sycophancy confirmed smaller variations, with fashions and people ceaselessly agreeing.
The proprietary LLM evaluators confirmed the similar common trend, despite the fact that their largest mismatches seemed with duration and vagueness – and so they had been particularly vulnerable to sycophancy, favoring agreeable solutions up to eighty-five % of the time, whilst people did so most effective about fifty % of the time.
To track the beginning of those biases, the researchers analyzed the aforementioned Skywork dataset, used to coach the praise fashions, mapping each and every bias to easy options which may be robotically measured, comparable to token rely for duration, or presence of lists for construction.
In a pattern of two,500 examples, human annotators confirmed transparent personal tastes for biased options: structured solutions had been appreciated over unstructured ones 65 % of the time, and jargon-heavy solutions had been selected 54 % of the time:

Human annotators within the working towards information ceaselessly picked solutions that integrated those bias options. This chart displays how ceaselessly construction, jargon, or vagueness seemed within the responses they appreciated or rejected, revealing the imbalances that fashions later discovered right through working towards.
Those imbalances counsel that the learning information itself nudged the fashions towards those patterns. To substantiate this, a correlation research used to be run, measuring how strongly variations in each and every function matched up with the personal tastes proven by means of each people and fashions.
The effects confirmed that each had been persistently influenced by means of the similar options, indicating that fashions discovered to affiliate positive stylistic characteristics with higher solutions, even if the ones characteristics didn’t if truth be told make stronger the reaction.

Correlation between function variations and personal tastes, appearing how each fashions and people had been influenced by means of the similar bias options right through working towards.
To lend a hand the fashions unlearn those biases, new working towards information used to be created. The Skywork dataset used to be reviewed to test if the prejudice function seemed in both the selected or rejected solutions; when each had been freed from the objective bias, GPT-4o rewrote the rejected resolution to insert it.
This created new working towards pairs the place the style may see transparent examples of biased and independent solutions, and thus be informed to not choose the biased model. With further examples from Chatbot Area, for stability, the fashions had been then fine-tuned in this up to date dataset:

The impact of fine-tuning with counterfactual information. The left panel displays how the fine-tuned fashions moved nearer to human personal tastes on maximum biases; the suitable panel displays decreased miscalibration, particularly for jargon and vagueness.
The fine-tuning introduced the fashions a lot nearer to human personal tastes, with the biggest enhancements observed for jargon and vagueness and smaller features for duration. Construction and sycophancy confirmed slight new mismatches, despite the fact that those mirrored previous imbalances somewhat than new disasters.
General efficiency remained strong during, and when more than one biases had been corrected without delay, bias ranges fell additional with out sacrificing reaction high quality.
The authors conclude:
‘Our manner considerably reduces miscalibration problems whilst retaining general competence of praise fashions. Long term paintings can believe adapting our post-training recipe to expand extra tough desire fashions and in addition evaluation desire fashions towards further bias axes.’
Conclusion
The brand new paintings is an engaging, if elliptical perception into the way in which that under-curated or over/under-represented working towards information could cause unwanted results at inference time. Any common LLM consumer will, by means of now, have a selection of struggle tales.
For example, most of the responses that I obtain from ChatGPT seem to have been influenced by means of search engine optimization developments of the final 10-15 years, the place on-line portals were compelled to optimize for Google placement as a substitute of herbal language. Certainly, the emoji-strewn and prodigious output of promoting departments seems to have had an excessively vital have an effect on on any request to put in writing a promotional LinkedIn submit – to the purpose the place AI-generated ‘enthusiasm’ is now inconceivable to omit:

Left: Requested to advertise a LinkedIn submit, in an account with 0 historical past, ChatGPT defaults to emojis and sensational PR-speak. Proper: Requested the similar factor after six months of me telling it to chill out, GPT produces one thing somewhat extra sober.
On the other hand, OpenAI actively intervenes in the way in which that ChatGPT responds to queries, relying on serve as and context, making it tough for researchers to tell apart between issues that stand up as a result of information, and knowledge distribution, at the side of linked problems comparable to annotation; and when a non-preferred end result is also because of industrial interference from the LLM’s host corporate.
* Because of the jargon-filled writing taste that the authors have selected for this paper, I’m heading off creator quotes the place conceivable in choose of summaries.
† Authors’ daring emphasis, no longer mine.
First printed Friday, June 6, 2025
Source link