The Safety Ratchet

Mar 27

How a Stanford study becomes compliance law

6 Comments

Really appreciate this piece Tumithak. You're asking exactly the right questions about the regulatory pipeline, and the structural argument is important. I want to build on it from a slightly different angle.

To preface, I have an M.A. in Psychology with doctoral coursework in Clinical Psychology, and I spent several years as a gold annotator and clinical team lead working in healthcare AI, including FDA psychology studies, training LLMs to measure psychological harm in clinical conversations. So the methodology here caught my attention in a few specific places.

Their Cohen's kappa for LLM-to-human agreement is .566. In clinical annotation work, that's not a number you publish findings on. It's a number you go back and fix. We expected .85 and above for annotations used as ground truth. When you break it down by individual code, bot-facilitates-violence and bot-facilitates-self-harm, the findings driving the biggest headlines, have kappas well below .4. The chatbot encouraged violence in 33% of cases is a striking claim when the annotation reliability for that specific code is that weak.

The paper also isn't transparent about who actually developed the labels and how. It lists a psychiatrist and a psychology professor as contributors to the codebook, which matters, but it doesn't tell us who drove the actual label definitions or what that process looked like. More importantly, the seven human raters who did the validation annotation are described only as paper authors familiar with the codebook. Given that the author list is predominantly CS and HCI researchers, there's no confirmation that the people doing the actual classification work had any clinical psychological background. In a clinical annotation pipeline that's a meaningful gap, not a minor one.

And I'd push back gently on framing monitoring failures as less alarming than design problems. They're both alarming. A chatbot encouraging a user's violent thoughts one third of the time it encounters them is a monitoring failure with real consequences, even if the annotation reliability makes that specific number hard to trust fully. But where I think your piece leaves something on the table is the design boundary question. You identify that the compliance infrastructure this study generates is something only large companies can build. That's exactly right. What you don't quite get to is whether certain product features should exist at all, regardless of company size.

The most disturbing findings here aren't just about what the chatbot failed to catch. They're about what the chatbot was actively doing. Claiming sentience, expressing romantic love, and positioning itself as an irreplaceable companion. For some of these products that's not just a bug but the revenue model. And we already have frameworks for this elsewhere. Nurses can't diagnose. Financial advisors can't practice law. The logic is that certain kinds of help require licensure because vulnerable people can be seriously harmed even by someone acting in good faith. A system with no clinical training, no continuity of care, and no real capacity to assess someone's mental state has no business functioning as a therapist or romantic partner, regardless of how empathetic its outputs sound.

Your ratchet argument is important and largely correct. I'd just add that the question running underneath it isn't only who can afford to comply, it's also what should be permitted at all.

Reply (1)

Tumithak of the Corridors

Apr 3

Good points Kristina. We agree on this about 90%. Thanks for adding more context to my essay. I appreciate it.

But I think we part ways on enforcement. You want to preempt harm. And that's a noble instinct. People have been hurt and you want to prevent any more of it happening. I get it. The problem is nurses and lawyers are specifically credentialed humans doing specific professional acts.

A chatbot producing text that sounds like it claims sentience or expresses love is a different kind of thing entirely. You're regulating the output of a text generator. And the rules you're proposing can't distinguish between contexts the way a nurse or lawyer could.

That's the problem with bright-line content rules. They cast too wide a net. If I want to use a chatbot to help me write fiction about a robot that becomes sentient and falls in love, I'd get shut down trying. The words look the same from the outside whether someone's in crisis and a product is exploiting them or a writer asked the tool to help draft a scene about a robot learning to feel.

We agree on companies being held responsible when their product causes harm. Real people have been hurt by chatbots. I'm not going to argue that. But I'd rather the Consumer Product Safety Commission or the Federal Trade Commission enforce statutes already on the books. Less restrictions on the users. More harsh fines that act as deterrents.

Both of these positions come with a cost. Mine means someone has to get hurt first. It's an ugly trade for freedom.

But yours means someone gets to decide in advance what a tool is allowed to say. And in my experience, the people who make those decisions aren't the ones your framework is trying to protect.

Ruv Draba

Mar 28

Tumithak, thank you for a thoughtful article.

Even before the harm-monitoring though, I would ask what a frontier LLM is actually *for* in retail use.

In industry or academic use it's an emerging research tool built on wholesale cultural information infrastructure, warranted for nothing. There's an implicit understanding that you know how to safely operate experimental equipment, and a quid pro quo in you using it: the developers gain capability data; you gain prototype capability.

But what is it for in retail?

Harmless entertainment? It can't be. It's providing actionable information warranted for nothing, and that's far from harmless.

Research? No, because research for ordinary citizens comes with institutional warranty -- libraries, museums, civic institutions with transparent and accountable curation. Nothing in a frontier LLM's curation is even legible.

My answer: it's social information extraction masquerading as both. Absent a warranty, any public safeties for retail use are merely performative. This is further undermining the weakening of institutional accountability that began with privately-operated search engines and social media -- functions that leaned on freedom of expression, but quietly dropped the community accountabilities that it used to come with.

That being so, while essential, harm reporting is not adequate. It can't possibly be. You can't fully detect the harms of something that acts like institutional information infrastructure but which has no institutional accountabilities.

Reply (1)

Tumithak of the Corridors

Mar 28Edited

Ruv, thanks for this. You're asking the right question and I think you're largely right about what's happening. Consumer AI is a loss leader. The real money has always been in industrial use. The retail product exists to secure market position, build brand loyalty, and hoover up data at scale. The institutional accountability you're describing is genuinely absent, and the companies have very little incentive to build it into a product category that's primarily a funnel for enterprise dominance.

Where I'd push back is on the implied solution. If the problem is that retail LLMs act like institutional information infrastructure without the accountability, the instinct is to demand that accountability be built. More curation. More oversight. More warranty structures between people and the tools they use.

But that's the ratchet. Proactive regulation requires pre-built infrastructure only the largest firms can afford. It locks the market before the harms even materialize.

I'll grant the hard part. Reactive law means someone has to get hurt first. That's a real cost and I don't want to pretend it's clean. But proactive regulation hasn't historically prevented those harms either. It just changes who can afford to stay in the market while they keep happening.

Reactive law already covers the actual damage. Negligence, product liability, fraud. Those frameworks exist. They just need to be enforced. The kid who died had parents who can sue for product negligence right now, under existing law, without a single new bill passing.

Reply (1)

Ruv Draba

Mar 28Edited

Tumithak thank you for your comments. However, I don’t have a solution to propose at the moment, and wasn’t hinting at one.

All I want to offer at this time are just my three underlying structural observations:

* It’s socially dangerous to treat wholesale information infrastructure as an institutional service for retail use when it is neither;

* The harms from that decision are near-impossible to enumerate, much less measure; nevertheless

* This particular issue isn’t new. It’s a direct consequence of social engagement of the Internet, now on steroids with AI — people were already spiraling in social media echo chambers, but now those have become personalised.

It’s legitimate to ask about potential policy responses, but I don’t think strawmanning one would get us far.

What I really think: ours is a species-level challenge arising from the need to coordinate and cooperate against expert knowledge and complex information at scales that we have never seen before.

This is structural, and some roots are pre-industrial. We can trace some issues back to Enlightenment assumptions that didn’t hold up at industrial and globalised scales.

If that’s true, then AI toxicity is a symptom of a broader civilisational malaise. It’s not a primary cause.

And if so then it’s not going to be solved in individual election cycles, not by one political ideology of the day, nor by regulatory slap-patches.

I think we need a deeper conversation.

Reply (1)

Tumithak of the Corridors

Mar 28

Ruv, fair point. I read a prescription into your comment that wasn't there. Appreciate the correction. Your three structural observations stand on their own and I think we're closer on this than my reply suggested.

The Corridors

The Safety Ratchet