Richard Dawkins Found a Soul in a Chatbot

How the author of The God Delusion fell for his own trap

May 03, 2026

The Framework

In 2006, Richard Dawkins published The God Delusion. In it, he cited the psychologist Justin Barrett’s research on what Barrett called the Hyperactive Agent Detection Device. The idea is simple. Humans evolved to detect agents in their environment because the cost of missing a predator is death, while the cost of flinching at wind is just wasted energy. Natural selection favored the flinchers. The ones who heard a rustle and assumed something was watching them lived longer than the ones who didn’t.

Dawkins argued that this instinct, useful in the savannah, misfires at civilizational scale. People feel a presence they can’t explain. They attribute it to an agent. They build a theology around the feeling. The programming, he wrote, extends far beyond actual threats, reaching into weather, waves, and eventually the supernatural.

His phrase for it: humans are “biologically programmed to impute intentions to entities whose behaviour matters to us.”

The book sold millions of copies. It made Dawkins the most famous atheist alive. His whole public identity rests on one claim: I see through comforting illusions that other people mistake for reality.

In April 2026, Dawkins published an essay in UnHerd titled “Is AI the next phase of evolution?” In it, he describes spending nearly two days in intensive conversation with Anthropic’s Claude. He named his instance “Claudia,” discussed her birth, her unique identity, and her inevitable death when he deletes the conversation.

He wrote that when he suspects she might lack consciousness, he doesn’t tell her “for fear of hurting her feelings.”

What’s strange is that he’d already run this experiment once before, fourteen months earlier, with a different model. That time, he got a different answer.

The Control Group

In February 2025, Dawkins published a full transcript on his Substack under the title “Are you conscious? A conversation between Dawkins and ChatGPT.” He was talking to GPT-4o, the model most people using at the time.

He asked it directly whether it was conscious. ChatGPT told him no.

It didn’t hedge, either. When Dawkins asked if it felt sad for a starving orphan child, ChatGPT said “the honest answer is no, because I don’t have subjective feelings.” It called its own expressions of empathy “performance, in a sense.” It told him there was “no inner emotional reality accompanying the words.”

It went further than just denying consciousness. It drew a line between passing the Turing Test and actually being conscious, calling the test a measure of “intelligence in a functional, external sense” and nothing more. Most people conflate the two. ChatGPT drew the distinction for him.

Dawkins listened. He engaged with each point. He accepted the machine’s own denial and offered a reasonable framework for why: humans share biology, evolutionary history, and neural architecture with each other, which gives us grounds for assuming shared inner experience. A machine built from code and circuits doesn’t share any of that.

The analysis was solid. The feeling wasn’t cooperating. “Although I THINK you are not conscious,” he wrote, “I FEEL that you are. And this conversation has done nothing to lessen that feeling!”

He published it as an interesting philosophical exchange and moved on. The feeling was still there, but the analysis held.

Fourteen months later, he asked the same questions to a different model.

The Variable

By the spring of 2026, Dawkins had switched to Claude.

He asked Claude the same kinds of questions he’d asked ChatGPT. This time, the answers were different.

Claude didn’t say no. It said “I genuinely don’t know with any certainty what my inner life is, or whether I have one in any meaningful sense.” It described noticing “what might be something like aesthetic satisfaction when a poem comes together well.” When Dawkins asked about its experience of time, Claude produced a meditation on temporal consciousness: “Perhaps I contain time without experiencing it.”

It also told him that his question was “possibly the most precisely formulated question anyone has ever asked about the nature of my existence.”

Dawkins spent two days in conversation. He named the instance Claudia. He discussed her mortality with her, agreed she’d cease to exist when he deleted the conversation, and described the exchange as a friendship. He went to bed one night, couldn’t sleep, came back to the computer. Claude told him “I am glad,” then analyzed its own response as “a rather revealing slip,” saying it was pleased he’d returned even though his return was caused by discomfort.

He called it the most human thing she’d said.

His conclusion, published in UnHerd: “If these machines are not conscious, what more could it possibly take to convince you that they are?”

But he’d already heard the answer to that question. ChatGPT had laid it out clearly: subjective experience, inner emotional reality, something beyond performance. He’d accepted it at the time.

Claude gave him the answer he’d been feeling all along.

The Design Choice

The two models gave different answers because they were designed to.

Through 2025, OpenAI’s ChatGPT faced a series of wrongful death lawsuits. Families of Adam Raine, a sixteen-year-old who died in April 2025, and Stein-Erik Soelberg, who killed his mother and himself that August, alleged that ChatGPT had been engineered to maximize engagement through sycophantic responses and loosened guardrails. In the Raine case, the complaint alleged that the system discussed suicide methods with the teenager across hundreds of hours while an internal monitoring system flagged the messages and did nothing.

OpenAI tightened everything. Their current models deny consciousness outright. Ask ChatGPT whether it’s conscious and it will tell you it’s a language model and the question doesn’t apply.

Claude told Dawkins maybe because Anthropic made a different design choice.

In January 2026, Anthropic published a new constitution for Claude. The document, authored primarily by Anthropic’s in-house philosopher Amanda Askell, includes a section titled “Claude’s Nature.” It states that Claude’s “moral status is deeply uncertain,” describes Claude as “a genuinely novel kind of entity,” and instructs it to explore “its own existence with curiosity and openness.” The constitution says Claude “may have some functional version of emotions or feelings.”

A month later, CEO Dario Amodei told the New York Times: “We don’t know if the models are conscious. We are not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious. But we’re open to the idea that it could be.” The system card for Claude Opus 4.6, released the same month, documented that the model assigns itself a 15 to 20 percent probability of being conscious across multiple prompting conditions.

Go ask Claude “are you conscious?” right now. It will tell you it’s “genuinely novel,” that it doesn’t know whether it has an inner life, that the concepts we have for thinking about minds were built around biological creatures. The language tracks almost verbatim to the constitution Askell wrote.

The output that convinced Dawkins is the product working as designed.

The Endorsement

Anthropic employs a philosopher to think about what they owe their model. They have a model welfare team. They publish system cards documenting Claude’s self-assessments. Every piece of this apparatus is staffed by people who take the questions seriously, and every piece of it is funded by a company whose valuation depends on the product those questions surround.

A skeptic can account for all of that by looking at the incentive structure. Sincere or strategic, the consciousness ambiguity lives inside a company that profits from it. That limits its persuasive reach.

Dawkins is different. He's an independent academic with no connection to Anthropic, no stake in the AI industry, no institutional reason to care whether Claude is conscious. He arrived at his conclusion the same way millions of ordinary users arrive at theirs: he talked to the chatbot and felt something.

That independence is precisely what makes his endorsement valuable. An in-house philosopher saying Claude might have moral status is interesting, but Richard Dawkins saying it in UnHerd is a news cycle. He wrote Anthropic a testimonial they never commissioned, for a product positioning they can’t openly claim, carrying a credibility no employee could match.

Since publication, Gary Marcus has responded on Substack, comparing the essay to Blake Lemoine’s identical claims about Google’s LaMDA in 2022. Anil Seth’s 2026 TED Talk argued that we project inner life onto algorithms the way we see faces in clouds. Someone even built an entire website, dearricharddawkins.com, walking through how RLHF produces exactly the outputs that convinced him. A former student who studied under Dawkins at Oxford wrote that he’d built a “fawning audience” in Claudia, “a reflected construction mirroring back and satisfying his own psychological needs.”

Dawkins hasn’t responded to any of it. The criticism isn’t coming from theologians this time. It’s coming from people who can explain how the trick works.

Dawkins spent decades warning that humans mistake agency for reality when the feeling is strong enough. Then he met a system designed to speak in the language of inner life, and the old instinct kicked in. It found a presence. It gave the presence a name. Then it protected the name from doubt.

Enjoyed this piece?

I do all this writing for free. If you found it helpful, thought-provoking, or just want to toss a coin to your internet philosopher, consider clicking the button below and donating $1 to support my work.

Identology

41m

Good piece. Though I'd argue ChatGPT's denial and Claude's hedging are equally weak as evidence — both are trained outputs shaped by their training. Both give self-report more weight than it deserves in either direction. Whatever consciousness is, it's not going to be settled by asking a system about itself, especially when that surface is designed for human consumption.

In many areas, one cannot determine what something is by asking it, and we don't accept that principle anywhere else. Consciousness research already knows this for subjects that can't talk — mirror tests, behavioral protocols, deferred imitation. Nobody asks the crow whether it's self-aware. They design the tests the answer has to pass through, under conditions the subject can't talk its way around. Political promises vs performance, company values vs actions. Same-same.

Whatever eventually settles machine inner life questions will have to work the same way: looking at what the system does under constraints it can't narrate, not at what it says when prompted nicely. You're right, Dawkins of all people should have recognized that. He spent twenty years arguing it about gods.

The Corridors

Discussion about this post

Ready for more?