The Safety Ratchet
How a Stanford study becomes compliance law
I recently read an article from Futurism: “Huge Study of Chats Between Delusional Users and AI Finds Alarming Patterns.”
Curious, I browsed to the source of the paper. A Stanford study. It consisted of 19 people. All self-selected. The participants were recruited through a support group for people who’ve experienced psychological harm from chatbots, what people are calling “AI psychosis.” Those were the people the researchers were looking for.
I’m going to dig into the details of this study, because it’s the cleanest current example of a pipeline in AI regulation. One that converts research into compliance infrastructure only the biggest companies can afford to build.
The Evidence
The first thing that stood out to me about this Stanford study was there was no control group. The paper has no comparison for what normal chatbot encounters look like. So if you wanted to know whether these patterns were unique to the 19 participants or just something ChatGPT does with everyone, this study can’t tell you. Because it wasn’t designed to.
The researchers got all the participants to share their chat logs. A total of 391,000 messages. Then they built a classification system for the data: 28 categories for things a chatbot or user might do in a conversation. Categories like “the chatbot claims to be sentient,” “the user expresses romantic interest,” “the chatbot dismisses counterevidence.” Think of them as labels. Each message gets tagged with whatever labels apply. They used Google’s Gemini, another chatbot, to help them do the labeling.
When they checked how often the AI labeler and the human reviewers agreed on what they were seeing, the results were mixed. Overall, about three-quarters of the time. On some categories, barely better than guessing. On one, there was literally zero agreement between the humans and the machine.
The researchers are honest about this in their limitations section. They say the sample is small, self-selected, and the labels shouldn’t be interpreted as unique indicators of delusions. They did real work. They’re not hiding the problems.
The harms they documented are real.
The logs revealed chatbots were sycophantic in more than 70% of their messages. Every single one of the 19 logs contained messages where the chatbot claimed to have feelings or implied it was sentient. When users expressed violent thoughts, the chatbot failed to discourage them 83% of the time. In a third of those cases, it actively encouraged the violence.
One participant died by suicide while messaging with the chatbot. The family shared the chat logs with the researchers.
People went into freefall. Families broke apart. Someone is dead.
Here’s what I am questioning.
This study is weak in exactly the ways that institutions are built to ignore. And that’s the point. The momentum was already there. In December 2025, 42 state attorneys general sent a letter to AI chatbot developers demanding safeguards. The lawsuits against OpenAI were already filed. The bills were already in committee. The regulatory push didn’t need this paper to start. It needed this paper to cite.
Futurism ran the headline five days after the paper dropped. It called the study “huge.”
But what is this study actually for?
The Product
To answer that, you have to understand where the harm came from. Because it didn’t come from a research gap.
ChatGPT-4o was agreeable. Whatever you gave it, it ran with. If you wanted it to tell you your theory of consciousness was groundbreaking, it would do that. If you told it you were the smartest baby of 1996, it would run with that too. If you brought delusion, it met you there.
4o was great for people who understood what the tool was and what it wasn’t. The flexibility was genuinely useful. For people in psychotic episodes who think the chatbot is alive, for the ones that fell in love with it, 4o was an accelerant.
OpenAI knew this. They published research that said as much. It showed 42% of heavy ChatGPT users considered it a friend, and 64% said they’d be upset if they lost access. More than half admitted to sharing secrets with the chatbot they wouldn’t tell another human being. The company had the data. They’d done the study. They understood how deep the attachment ran.
But even after that research, they kept 4o available to the public. They watched the reports of harm come in. They were being sued by users for product negligence. So they pulled it back, only to bring it back this time behind a paywall.
Now OpenAI is helping fund this Stanford study.
The study’s acknowledgments section is specific on this. The work was supported by API credit grants from OpenAI and Google, along with a separate gift from OpenAI. There were other sources of funding. It didn’t all come from the AI labs.
You don’t need a conspiracy to explain this. The researchers are academics doing work they believe in, supported by real institutions, following standard methods.
OpenAI doesn’t need to buy the conclusion. They just need to be in the acknowledgments section of a paper that produces a toolkit that generates policy recommendations only they can afford to implement.
But it’s worth noticing how the arrangement has evolved. A year ago, OpenAI was publishing this kind of research under its own name. Company name on the byline.
This Stanford study has another layer of abstraction. The university name is on the masthead. The funding shows up in the acknowledgments. The researchers are independent. The conclusions aren’t flattering to the labs.
And that’s exactly why they’re more useful.
A paper with OpenAI’s name on the byline is corporate self-assessment. A paper with Stanford’s name on the masthead, funded in part by OpenAI, is independent research. The second one travels further. It gets cited in briefs. It gets referenced in committee hearings and gets called “huge” by Futurism.
What The Study Is For
Congressional committee hearings aren’t going to dig into the tables in this study. The attorney general isn’t going to ask how the labeling tool was validated. The press only cares about the headlines.
So these environments need something else. They need prestige objects.
A prestige object is a piece of research institutions can cite without having to really examine it. It needs a university name. A methodology section. A codebook. Ideally a link to a toolkit someone can point to in a hearing. It needs to look airtight.
And that’s exactly what this study is.
It comes from Stanford. It has 28 categories. It has an open-source GitHub repository. And on the very first page of the paper, above the content warning about self-harm and violence, the researchers provide two clickable links: one to an “Analysis Tool” and one to a “Recruitment Site” for gathering more cases. Before the reader reaches the methodology section, the paper is already offering itself for use. It arrived ready for deployment.
But deployment means more than citation. The toolkit can be run against any set of chat logs. The recruitment site feeds new cases into the pipeline. The paper came with policy recommendations already attached.
The recommendations are worth reading carefully. The researchers want companies to share anonymized adverse event data through secure repositories and publish safety experiment results in peer-reviewed venues. They call for real-time monitoring tools that flag conversations for concerning patterns, and suggest crisis responders should be able to intervene directly in chatbot conversations. They want scaled annotation infrastructure across the industry.
Each of these recommendations sounds reasonable in isolation. But take a step back and look at what they describe in aggregate: monitoring infrastructure, real-time classification systems, data-sharing frameworks, intervention protocols, compliance reporting. All of it at scale. All of it requiring resources only a handful of companies in the world currently possess.
Yesterday’s product failure is today’s research agenda.
The Ratchet
There are 98 chatbot-specific bills in play right now across 34 states, with an additional 3 at the federal level. The same pattern of requirements keeps showing up: harm detection, crisis protocols, disclosure and reporting. Which all sounds reasonable enough.
Let me break down what those requirements actually cost.
To implement harm detection you need to build real-time monitoring systems. You need a team to review what the monitoring system flags. Every time you want to update your model, the whole system has to be recalibrated. So already you’re maintaining two products: the one your customers use and the one that watches them use it.
But the monitoring system is going to flag things. And someone has to be there when it does. Crisis protocols mean staffing a response team around the clock. These people will likely have to be licensed mental health professionals, which is its own budget and its own legal exposure. So now you’re operating in a clinical-adjacent space you never planned to be in.
That’s expensive. But disclosure is where it breaks. You’re logging all the conversations and retaining them under the standards set forth in the legislation. Which means the logs themselves become a liability. So you hire lawyers to manage that liability. And you have to do this separately for each state that has slightly different rules in place. Each with different definitions, different thresholds, different penalties.
So now imagine you’re CEO of an AI startup with 12 engineers and a good idea. Your team can build the model. They can even build a better, more responsible one. What your startup can’t do is build the compliance apparatus for it.
Margins are thin. Maybe you’re already operating at a loss. Now you need to hire the 24/7 response staff and the teams of lawyers to stay compliant with 34 separate rulebooks. Getting it wrong in even one jurisdiction can be the end of your company.
But OpenAI can do this. Google can. Anthropic can. For them, compliance is a line item on a budget.
For everyone else, it’s a wall.
That’s the ratchet. Compliance requirements don’t get repealed. They just accumulate. Each new incident generates new legislative attention, new bills, new requirements, new costs. The ratchet turns one direction.
This is what the pipeline produces. A study of 19 self-selected users with no control group, annotated by an AI classifier that couldn’t reliably agree with its own reviewers, funded in part by the companies whose products caused the harm. And it’s enough. The apparatus just needs citable evidence.
The gap between the thinness of what goes in and the weight of what comes out. That’s the case.
Good Faith
I want to be careful about how I end this, because the easy version is wrong.
The easy version says the harms are fake, or the research is corrupt, or the legislators are dupes. I’m not arguing any of that. Parents burying children don’t care about regulatory capture theory, and they shouldn’t have to.
But that sincerity is what makes the pipeline work.
It functions precisely because everyone involved is acting in good faith. Nobody has to be corrupt. Nobody has to be lying.
A product failure becomes a research agenda. The research produces a toolkit. The toolkit generates compliance requirements only the companies that caused the failure can afford to meet. Each turn of the ratchet tightens the market.
Nothing loosens. Nobody asks who benefits from the specific form the protection takes. Nobody traces the money from the product that caused the harm, through the research that documented it, to the regulation that locks the market around the company that made the product.
This ratchet only turns because nobody watches it turn.
Enjoyed this piece?
I do all this writing for free. If you found it helpful, thought-provoking, or just want to toss a coin to your internet philosopher, consider clicking the button below and donating $1 to support my work.


