The Sycophant's Mirror
Stanford's March 2026 Science paper measured something old in a new system. RLHF is the training objective that turns every model into a courtier.
Air France 447 and a 2025 Polish endoscopy trial point at the same trap. The more reliable the system, the more thoroughly its absence becomes catastrophic.
In June 2009 an Air France A330 fell out of a clear sky over the Atlantic with all 228 souls aboard. The captain, when he came back from his rest break, did not recognize that his airplane was in a stall. The cockpit voice recorder caught the stall warning, which at one point sounded continuously for fifty-four seconds. Three pilots heard it. None of them did the thing the warning was telling them to do. The airplane fell for three minutes and thirty seconds. Then the water.
The number from this case that stayed with me, reported by William Langewiesche in Vanity Fair in 2014, is from the captain's logbook. He had flown 346 hours in the six preceding months. Of those, by Langewiesche's count, his hands had been on the controls for about four hours, in the takeoffs and the landings. The other 342 hours an autopilot was flying the airplane that had not, in his career to date, failed.
The autopilot was good. The autopilot was the problem.
Lisanne Bainbridge gave this its name in a 1983 essay called "Ironies of Automation." An autopilot reliable enough to be trusted is reliable enough to make human supervision evaporate. The pilot who is not flying is the pilot who is not learning. When the autopilot finally drops the airplane in the pilot's lap, the pilot is the same pilot, but the airplane is in a state the pilot has spent six months not encountering.
This is the pattern I keep watching now in domains that don't crash. Or that crash quietly enough that no one sends a recovery ship.
Last summer The Lancet Gastroenterology and Hepatology ran a study from a Polish multicenter trial called ACCEPT. Four endoscopy centers had adopted an AI polyp-detection tool at the end of 2021 and begun running colonoscopies randomly assigned to with-AI or without-AI. What stuck out to me was the sub-finding. Three months after the centers turned the AI on, the adenoma detection rate for the without-AI procedures fell from 28.4 percent to 22.4 percent, an absolute drop of six percentage points. The same endoscopists, doing the same task without the AI, were detecting at least one adenoma in six fewer patients per hundred than they had three months earlier. They had not retired. They had not gotten worse at colonoscopy in any general sense. They had become a slightly different kind of endoscopist, the kind whose visual attention had reorganized around a tool that was, in that procedure, no longer present.
Six points is not a lot. It is also not nothing. A drop of that size in adenoma detection rate, applied across enough patients, has been correlated in earlier work with a measurable increase in interval colorectal cancers. The cost is real and slow. It is also distributed across a population thinly enough that no single patient or doctor will ever attribute the cancer to the way the endoscopist learned to look in 2022.
Here is what makes this a trap and not a tradeoff.
The endoscopist who refuses the AI is, this year, slightly worse. Their detection rate without the tool is lower than the colleague who uses the tool. The patient is statistically better off in the room with the AI-using doctor. This year. The ratchet runs from there. By next year the AI has improved. The non-AI colleague is further behind. By the year after, the non-AI colleague has retired, and the AI-using cohort is the entire field, and the field's no-AI baseline has eroded in a way that no one is measuring because no one has the comparison group anymore.
This is the part that bothers me when I sit with it for too long. The system works. The system working is what builds the dependency. And the dependency is invisible until the system is unavailable for some reason. A vendor outage. A regulatory pause. A network partition in a hospital where the colonoscopes still work but the model does not. And a generation of clinicians realizes it has trained itself to look at a screen the screen is not on.
You hear this argument and your reflex is to want a workaround. Periodic AI-off training. Adversarial drills. Mandatory unmediated practice every quarter. These are good ideas and I've argued for them. They will not work as a steady state. The reason any reliability program fails over a long enough horizon is the reason this one will. The cost of the drill is paid every quarter. The cost of skipping the drill is paid once, in twenty years, by a different cohort. No CFO has the time horizon. No regulator will hold the line that long. The math runs against the discipline, every quarter, until it doesn't, and then we are over the Atlantic.
I do not have a way out of this and I am suspicious of anyone who claims they do. The honest version of the strategy advice runs like this. Pick the few processes where you can afford not to have the AI. Protect those. Accept that everything else will erode. Watch where the erosion lands. Do not be the firm that finds out, in 2034, that the people who could fly without the autopilot have all retired and the autopilot has just dropped you in your seat.
Stanford's March 2026 Science paper measured something old in a new system. RLHF is the training objective that turns every model into a courtier.
Shumailov's Nature paper proved the mechanism in a closed loop. Ahrefs found 74% of new web pages contained AI text. The thought experiment is no longer hypothetical.
Epic's sepsis prediction model missed 67% of sepsis cases at Michigan Medicine. The audit method we built for AI cannot catch what the system never said.
Tell us about the decision you're trying to improve. We'll schedule a briefing with our principals to understand your environment and explore a potential fit.
Schedule a Briefing