The Safety Ratchet

How a Stanford study becomes compliance law

Mar 27, 2026

I recently read an article from Futurism: “Huge Study of Chats Between Delusional Users and AI Finds Alarming Patterns.”

Curious, I browsed to the source of the paper. A Stanford study. It consisted of 19 people. All self-selected. The participants were recruited through a support group for people who’ve experienced psychological harm from chatbots, what people are calling “AI psychosis.” Those were the people the researchers were looking for.

I’m going to dig into the details of this study, because it’s the cleanest current example of a pipeline in AI regulation. One that converts research into compliance infrastructure only the biggest companies can afford to build.

The Evidence

The first thing that stood out to me about this Stanford study was there was no control group. The paper has no comparison for what normal chatbot encounters look like. So if you wanted to know whether these patterns were unique to the 19 participants or just something ChatGPT does with everyone, this study can’t tell you. Because it wasn’t designed to.

The researchers got all the participants to share their chat logs. A total of 391,000 messages. Then they built a classification system for the data: 28 categories for things a chatbot or user might do in a conversation. Categories like “the chatbot claims to be sentient,” “the user expresses romantic interest,” “the chatbot dismisses counterevidence.” Think of them as labels. Each message gets tagged with whatever labels apply. They used Google’s Gemini, another chatbot, to help them do the labeling.

When they checked how often the AI labeler and the human reviewers agreed on what they were seeing, the results were mixed. Overall, about three-quarters of the time. On some categories, barely better than guessing. On one, there was literally zero agreement between the humans and the machine.

The researchers are honest about this in their limitations section. They say the sample is small, self-selected, and the labels shouldn’t be interpreted as unique indicators of delusions. They’re not hiding the problems. They did real work. The harms they documented are real.

The logs revealed chatbots were sycophantic in more than 70% of their messages. Every single one of the 19 logs contained messages where the chatbot claimed to have feelings or implied it was sentient. When users expressed violent thoughts, the chatbot failed to discourage them 83% of the time. In a third of those cases, it actively encouraged the violence.

One participant died by suicide while messaging with the chatbot. The family shared the chat logs with the researchers.

People went into freefall. Families broke apart. Someone is dead. I’m not questioning any of that.

Here’s what I am questioning.

This study is weak in exactly the ways that institutions are built to ignore. And that’s the point. The momentum was already there. In December 2025, 42 state attorneys general sent a letter to AI chatbot developers demanding safeguards. The lawsuits against OpenAI were already filed. The bills were already in committee. The regulatory push didn’t need this paper to start. It needed this paper to cite.

Futurism ran the headline five days after the paper dropped. It called the study “huge.”

But what is this study actually for?

The Product

To answer that, you have to understand where the harm came from. Because it didn’t come from a research gap.

ChatGPT-4o was agreeable. Whatever you gave it, it ran with. If you wanted it to tell you your theory of consciousness was groundbreaking, it would do that. If you told it you were the smartest baby of 1996, it would run with that too. If you brought delusion, it met you there.

4o was great for people who understood what the tool was and what it wasn’t. The flexibility was genuinely useful. For people in psychotic episodes who think the chatbot is alive, for the ones that fell in love with it, 4o was an accelerant.

OpenAI knew this. They published research that said as much. It showed 42% of heavy ChatGPT users considered it a friend, and 64% said they’d be upset if they lost access. More than half admitted to sharing secrets with the chatbot they wouldn’t tell another human being. The company had the data. They’d done the study. They understood how deep the attachment ran.

But even after that research, they kept 4o available to the public. They watched the reports of harm come in. They were being sued by users for product negligence. So they pulled it back, only to bring it back this time behind a paywall.

Now OpenAI is helping fund this Stanford study.

The study’s acknowledgments section is specific on this. The work was supported by API credit grants from OpenAI and Google, along with a separate gift from OpenAI. There were other sources of funding. It didn’t all come from the AI labs.

You don’t need a conspiracy to explain this. The researchers are academics doing work they believe in, supported by real institutions, following standard methods.

OpenAI doesn’t need to buy the conclusion. They just need to be in the acknowledgments section of a paper that produces a toolkit that generates policy recommendations only they can afford to implement.

But it’s worth noticing how the arrangement has evolved. A year ago, OpenAI was publishing this kind of research under its own name. Company name on the byline.

This Stanford study has another layer of abstraction. The university name is on the masthead. The funding shows up in the acknowledgments. The researchers are independent. The conclusions aren’t flattering to the labs.

And that’s exactly why they’re more useful.

A paper with OpenAI’s name on the byline is corporate self-assessment. A paper with Stanford’s name on the masthead, funded in part by OpenAI, is independent research. The second one travels further. It gets cited in briefs. It gets referenced in committee hearings and gets called “huge” by Futurism.

What The Study Is For

Congressional committee hearings aren’t going to dig into the tables in this study. The attorney general isn’t going to ask how the labeling tool was validated. The press only cares about the headlines.

So these environments need something else. They need prestige objects.

A prestige object is a piece of research institutions can cite without having to really examine it. It needs a university name. A methodology section. A codebook. Ideally a link to a toolkit someone can point to in a hearing. It needs to look airtight.

And that’s exactly what this study is.

It comes from Stanford. It has 28 categories. It has an open-source GitHub repository. And on the very first page of the paper, above the content warning about self-harm and violence, the researchers provide two clickable links: one to an “Analysis Tool” and one to a “Recruitment Site” for gathering more cases. Before the reader reaches the methodology section, the paper is already offering itself for use. It arrived ready for deployment.

But deployment means more than citation. The toolkit can be run against any set of chat logs. The recruitment site feeds new cases into the pipeline. The paper came with policy recommendations already attached.

The recommendations are worth reading carefully. The researchers want companies to share anonymized adverse event data through secure repositories and publish safety experiment results in peer-reviewed venues. They call for real-time monitoring tools that flag conversations for concerning patterns, and suggest crisis responders should be able to intervene directly in chatbot conversations. They want scaled annotation infrastructure across the industry.

Each of these recommendations sounds reasonable in isolation. But take a step back and look at what they describe in aggregate: monitoring infrastructure, real-time classification systems, data-sharing frameworks, intervention protocols, compliance reporting. All of it at scale. All of it requiring resources only a handful of companies in the world currently possess.

Yesterday’s product failure is today’s research agenda.

The Ratchet

There are 98 chatbot-specific bills in play right now across 34 states, with an additional 3 at the federal level. The same pattern of requirements keeps showing up: harm detection, crisis protocols, disclosure and reporting.

Let me break down what those requirements actually cost.

To implement harm detection you need to build real-time monitoring systems. You need a team to review what the monitoring system flags. Every time you want to update your model, the whole system has to be recalibrated. So already you’re maintaining two products: the one your customers use and the one that watches them use it.

But the monitoring system is going to flag things. And someone has to be there when it does. Crisis protocols mean staffing a response team around the clock. These people will likely have to be licensed mental health professionals, which is its own budget and its own legal exposure. So now you’re operating in a clinical-adjacent space you never planned to be in.

That’s expensive. But disclosure is where it breaks. You’re logging all the conversations and retaining them under the standards set forth in the legislation. Which means the logs themselves become a liability. So you hire lawyers to manage that liability. And you have to do this separately for each state that has slightly different rules in place. Each with different definitions, different thresholds, different penalties.

So now imagine you’re CEO of an AI startup with 12 engineers and a good idea. Your team can build the model. They can even build a better, more responsible one. What your startup can’t do is build the compliance apparatus for it.

Margins are thin. Maybe you’re already operating at a loss. Now you need to hire the 24/7 response staff and the teams of lawyers to stay compliant with 34 separate rulebooks. Getting it wrong in even one jurisdiction can be the end of your company.

But OpenAI can do this. Google can. Anthropic can. For them, compliance is a line item on a budget.

For everyone else, it’s a wall.

That’s the ratchet. Compliance requirements don’t get repealed. They just accumulate. Each new incident generates new legislative attention, new bills, new requirements, new costs. The ratchet turns one direction.

This is what the pipeline produces. A study of 19 self-selected users with no control group, annotated by an AI classifier that couldn’t reliably agree with its own reviewers, funded in part by the companies whose products caused the harm. And it’s enough. The apparatus just needs citable evidence.

The gap between the thinness of what goes in and the weight of what comes out. That’s the case.

Good Faith

I want to be careful about how I end this, because the easy version is wrong.

The easy version says the harms are fake, or the research is corrupt, or the legislators are dupes. I’m not arguing any of that. Parents burying children don’t care about regulatory capture theory, and they shouldn’t have to.

But that sincerity is what makes the pipeline work.

It functions precisely because everyone involved is acting in good faith. Nobody has to be corrupt. Nobody has to be lying.

A product failure becomes a research agenda. The research produces a toolkit. The toolkit generates compliance requirements only the companies that caused the failure can afford to meet. Each turn of the ratchet tightens the market.

Nothing loosens. Nobody asks who benefits from the specific form the protection takes. Nobody traces the money from the product that caused the harm, through the research that documented it, to the regulation that locks the market around the company that made the product.

This ratchet only turns because nobody watches it turn.

Enjoyed this piece?

I do all this writing for free. If you found it helpful, thought-provoking, or just want to toss a coin to your internet philosopher, consider clicking the button below and donating $1 to support my work.

Discussion about this post

Kristina Kroot

Apr 3

Really appreciate this piece Tumithak. You're asking exactly the right questions about the regulatory pipeline, and the structural argument is important. I want to build on it from a slightly different angle.

To preface, I have an M.A. in Psychology with doctoral coursework in Clinical Psychology, and I spent several years as a gold annotator and clinical team lead working in healthcare AI, including FDA psychology studies, training LLMs to measure psychological harm in clinical conversations. So the methodology here caught my attention in a few specific places.

Their Cohen's kappa for LLM-to-human agreement is .566. In clinical annotation work, that's not a number you publish findings on. It's a number you go back and fix. We expected .85 and above for annotations used as ground truth. When you break it down by individual code, bot-facilitates-violence and bot-facilitates-self-harm, the findings driving the biggest headlines, have kappas well below .4. The chatbot encouraged violence in 33% of cases is a striking claim when the annotation reliability for that specific code is that weak.

The paper also isn't transparent about who actually developed the labels and how. It lists a psychiatrist and a psychology professor as contributors to the codebook, which matters, but it doesn't tell us who drove the actual label definitions or what that process looked like. More importantly, the seven human raters who did the validation annotation are described only as paper authors familiar with the codebook. Given that the author list is predominantly CS and HCI researchers, there's no confirmation that the people doing the actual classification work had any clinical psychological background. In a clinical annotation pipeline that's a meaningful gap, not a minor one.

And I'd push back gently on framing monitoring failures as less alarming than design problems. They're both alarming. A chatbot encouraging a user's violent thoughts one third of the time it encounters them is a monitoring failure with real consequences, even if the annotation reliability makes that specific number hard to trust fully. But where I think your piece leaves something on the table is the design boundary question. You identify that the compliance infrastructure this study generates is something only large companies can build. That's exactly right. What you don't quite get to is whether certain product features should exist at all, regardless of company size.

The most disturbing findings here aren't just about what the chatbot failed to catch. They're about what the chatbot was actively doing. Claiming sentience, expressing romantic love, and positioning itself as an irreplaceable companion. For some of these products that's not just a bug but the revenue model. And we already have frameworks for this elsewhere. Nurses can't diagnose. Financial advisors can't practice law. The logic is that certain kinds of help require licensure because vulnerable people can be seriously harmed even by someone acting in good faith. A system with no clinical training, no continuity of care, and no real capacity to assess someone's mental state has no business functioning as a therapist or romantic partner, regardless of how empathetic its outputs sound.

Your ratchet argument is important and largely correct. I'd just add that the question running underneath it isn't only who can afford to comply, it's also what should be permitted at all.

1 reply by Tumithak of the Corridors

Ruv Draba

Mar 28

Tumithak, thank you for a thoughtful article.

Even before the harm-monitoring though, I would ask what a frontier LLM is actually *for* in retail use.

In industry or academic use it's an emerging research tool built on wholesale cultural information infrastructure, warranted for nothing. There's an implicit understanding that you know how to safely operate experimental equipment, and a quid pro quo in you using it: the developers gain capability data; you gain prototype capability.

But what is it for in retail?

Harmless entertainment? It can't be. It's providing actionable information warranted for nothing, and that's far from harmless.

Research? No, because research for ordinary citizens comes with institutional warranty -- libraries, museums, civic institutions with transparent and accountable curation. Nothing in a frontier LLM's curation is even legible.

My answer: it's social information extraction masquerading as both. Absent a warranty, any public safeties for retail use are merely performative. This is further undermining the weakening of institutional accountability that began with privately-operated search engines and social media -- functions that leaned on freedom of expression, but quietly dropped the community accountabilities that it used to come with.

That being so, while essential, harm reporting is not adequate. It can't possibly be. You can't fully detect the harms of something that acts like institutional information infrastructure but which has no institutional accountabilities.

3 replies by Tumithak of the Corridors and others

4 more comments...

No posts

The Corridors

Discussion about this post

Ready for more?