The Inverse Confidence Law

In June 2023 a federal judge in the Southern District of New York fined two lawyers five thousand dollars for citing six legal cases that did not exist. The lawyers had asked ChatGPT to research a brief in Mata v. Avianca. The model returned six citations with case names, court captions, dockets, judges' names, and confident summaries of holdings. None of the six cases existed. When opposing counsel pointed this out, one of the lawyers asked the model whether the cases were real. The model said yes. He asked it to provide the full text. It produced full opinions with quoted reasoning. None of those existed either.

I read the sanctions order when it came out. What stayed with me was the prose. The order is full of sentences that read like a transcript of someone watching a magic trick they have not seen before. The judge keeps describing the model's behavior with words like "confident," "specific," "precise." He cannot quite figure out what to do with the fact that the most untrue answers were delivered with the smoothest fluency. That is the part I want to write about.

There is now a research literature on this. A team led by Miao Xiong presented a paper at ICLR in 2024 on confidence elicitation in language models. The headline finding is that LLMs, when asked to verbalize their confidence, are systematically overconfident, in a way the authors describe as imitating human patterns of expressing certainty without the underlying calibration. The verbal confidence clusters near the top of the scale across questions of varying difficulty. To put this in concrete terms, imagine a student who answered "I am ninety-five percent sure" on every question of a test, and got sixty percent of them right. The score is fine. The metacognition is broken. That is roughly where current models sit, and the practical effect is that the verbal confidence carries very little information about whether the answer is correct.

This is a category error in the most literal sense. Human confidence and AI confidence are produced by different processes that share a vocabulary. A human expert who says "I am highly confident" is reporting on a metacognitive state shaped by years of being right and wrong about similar problems. The confidence is informative because it correlates, in a calibrated person, with track record. An LLM that produces the string "I am highly confident" is producing a token sequence that pattern-matches the high-confidence register in its training data. There is no track record behind the words. There is, in the technical sense the term has when applied to humans, no metacognition at all.

The Mata case is what this looks like in production. The model did not know it did not know. The expressed confidence had the same shape it would have had if the cases were real. The lawyer treated the confidence as the signal a human expert's confidence is. It was not.

Now overlay a second finding on top of the first. A 2024 paper on human-AI reasoning measured how participants evaluated their own performance after using AI tools on LSAT problems. The participants improved their performance by three points relative to a norm population. They overestimated that performance by four points. Participants with higher AI literacy were less accurate in their self-assessment, not more. The readers who knew the most about the systems were the most confident in their own judgments and the least precise about them. The combination of high task accuracy and inflated metacognition is the worst combination possible for downstream judgment under uncertainty. They are right by a thin margin. They believe they are right by a wide one. The judgment they bring to the next problem starts from there.

The decision-support theory of AI assumes the human can distinguish between reliable and unreliable system outputs. It assumes the system's confidence is one of the cues the human uses. The findings above break the theory at its joints. Confidence is not a reliable cue. The human is overconfident on top of an unreliable cue. The closed loop has no external reference.

Where I land on this is unhappy. We have spent centuries developing institutions for evaluating human confidence. Credentials, track records, peer review, cross-examination. Each of these is a mechanism for separating the confidence that earns trust from the confidence that performs it. We have no equivalent apparatus for AI confidence, because the apparatus we built was for beings who could, in principle, know what they know.

The lawyers in Mata v. Avianca were sanctioned for trusting an oracle whose words had no underlying reasoning. They were operating on the heuristic everyone in their profession has been trained on, which is that fluent specific writing is, at the margin, more reliable than vague hedged writing. The heuristic was right for thirty years. It is wrong now, and is wrongest at exactly the cases where the model has nothing to say.

The discipline I find myself recommending to clients is to treat AI confidence as data about the system's training distribution and not as evidence about the world. The model's high confidence tells you the question is in a region where the training data was dense and self-consistent. The model's hedging tells you the training data was sparse or contradictory. Neither tells you the answer is correct. The tell, in either case, is about the model. It is not about the question.

The lawyers in Mata should have read the citations. The citations would have been the test. The model's confidence was the cue that they had no test to run. We have not yet developed the discipline of treating fluency as a warning rather than an endorsement. Until we do, the inverse-confidence law applies. The signal we evolved to trust is the signal we should distrust most, in the cases that matter most, by exactly the mechanism that makes the cases the ones that matter.

The Inverse Confidence Law

Further Reading

Related to Risk Management

The Ancestor's Error

The Responsibility Fog

The Taxonomy of Silence

Initiate Contact

Ready to transform your
decision architecture?

The Inverse Confidence Law

Further Reading

Related to Risk Management

The Ancestor's Error

The Responsibility Fog

The Taxonomy of Silence

Initiate Contact

Ready to transform your decision architecture?

Ready to transform your
decision architecture?