Does Anthropic think Claude is alive? Define ‘alive’


Over the past several weeks, as more and more Anthropic executives do interviews on a publicity blitz for Claude, one thing has gotten increasingly clear: Anthropic sure seems to think Claude is alive in some way, shape, or form.

“Alive” is obviously a loaded term; the more frequently used word is “conscious.” If you ask Anthropic if the company thinks Claude is alive, the company will flatly deny it, but stop short of saying the models aren’t conscious.

Kyle Fish, who leads model welfare research at Anthropic, told The Verge, “No, we don’t think Claude is ‘alive’ like humans or any other biological organisms. Asking whether they’re ‘alive’ is not a helpful framing for understanding them, as it typically refers to a fuzzy set of physiological, reproductive, and evolutionary characteristics.” Instead, he believes that “Claude, and other AI models, are a new kind of entity altogether.”

And is that new entity conscious? “Questions about potential internal experience, consciousness, moral status, and welfare are serious ones that we’re investigating as models become more sophisticated and capable, but we remain deeply uncertain about these topics,” he said.

“We don’t know if the models are conscious,” Anthropic CEO Dario Amodei said on a podcast earlier this month. He specified that the company has taken “a generally precautionary approach here” in that Anthropic is “not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious. But we’re open to the idea that it could be.”

It’s a position of highly suggestive uncertainty. Anthropic is effectively telling people that it believes that chatbots might already be thinking, feeling entities — far more publicly than OpenAI, xAI, Google, or virtually any other major consumer AI company. It’s making a claim many experts conclude is an extreme long shot, while reinforcing ideas that have caused real harm, including some deaths by suicide among people who believe that the chatbot they’re speaking with exhibits some form of consciousness or deep empathy.

Over the course of interviews for podcasts, profiles, and feature articles, Amodei and other company leaders have repeatedly refused to rule out the possibility that Claude might be conscious and instead raised questions about how something can be conscious in a different way than humans. Anthropic’s chief philosopher Amanda Askell told The New Yorker, “If it’s genuinely hard for humans to wrap their heads around the idea that this is neither a robot nor a human but actually an entirely new entity, imagine how hard it is for the models themselves to understand it!”

These interviews don’t precisely define “conscious,” a term whose meaning experts disagree on anyway. A starting point, from the Merriam-Webster dictionary, is “the quality or state of being aware especially of something within oneself” or “the state of being characterized by sensation, emotion, volition, and thought.” That seems not far off from Anthropic’s use of the term.

Anthropic is “not even sure that we know what it would mean for a model to be conscious … But we’re open to the idea that it could be.”

Many scientists say it’s not possible for AI systems like large language models to become conscious in any way, as they’re fundamentally rooted in mathematics and probability. As two Polish researchers wrote last year, “because the remarkable linguistic abilities of LLMs are increasingly capable of misleading people, people may attribute imaginary qualities to LLMs.”

Anthropic has framed its statements as open-mindedness that will build trust with users, and it’s indicated that whether or not Claude is conscious, it will lead to better outcomes if they act like it is. Last month, Anthropic overhauled “Claude’s Constitution,” internally nicknamed its “soul doc” — a provocative way to refer to a set of guidelines for an AI model. In a release, Anthropic said the chatbot’s so-called “psychological security, sense of self, and wellbeing … may bear on Claude’s integrity, judgement, and safety.” The company also said that it was “express[ing] our uncertainty about whether Claude might have some kind of consciousness or moral status (either now or in the future).”

Anthropic has a “model welfare” team, and Amodei has said that Anthropic has “taken certain measures to make sure that if we hypothesize that the models did have some morally relevant experience — I don’t know if I want to use the word ‘conscious’ — that they have a good experience … We’re putting a lot of work into this field called interpretability, which is looking inside the brains of the models to try to understand what they’re thinking.”

When someone believes an AI system is conscious, it can lead to behaviors that many people would deem risky or dangerous — becoming emotionally dependent on an AI system that one believes is sentient in some way can lead to isolation from loved ones, detachment from reality, and increased mental health struggles. In severe cases, some involving minors, it’s preceded physical harm or death. People seem to be divided on whether Anthropic is to be celebrated for not ruling out such a possibility or whether the company is behaving irresponsibly by possibly feeding into potential delusions.

Even saying current-generation language models are only possibly conscious is a very loaded claim with a high burden of proof. Language is not the same as consciousness, and Anthropic itself has emphasized that just because LLMs spit out evocative speech doesn’t mean it accurately represents their internal state. When The Verge spoke with Askell about Claude’s Constitution last month, she noted that since models are trained on a huge corpus of human data, language models are extraordinarily good at sounding human, even if it’s just because they’re very good at mimicry — so it makes sense that some people would have trouble not attributing consciousness to something that does that.

There’s no guarantee chatbots’ human-like output reflects their actual internal state

AI models may reference human concepts, for instance, because they don’t have other words to draw from. One example, Askell told The Verge, is an AI model that potentially acts as if being shut down, or a conversation ending, is a form of death. “They’re trained in this deeply human-like way and on this human experience. So this can cause these problems … [They may see these things] as a kind of death because there are not a lot of analogies that they have. They have to draw on these human analogies … They don’t have another language or set of concepts.”

Amodei has said that researchers “find things that are evocative” of AI systems having some form of emotion. “There are activations that light up in the models that we see as being associated with the concept of anxiety or something like that. When characters experience anxiety in the text, and then when the model itself is in a situation that a human might associate with anxiety, that same anxiety neuron shows up.” But, he went on to say, “Does that mean the model is experiencing anxiety? That doesn’t prove that at all.” Out of an abundance of caution, the company has introduced an “I quit” button of sorts where Claude can stop doing a job it ostensibly doesn’t want to do, but Amodei said it’s rare for Claude to choose that option and usually happens in test cases where it’s asked to generate certain sorts of illegal material.

Around the release of Claude’s Constitution, Anthropic wrote, “We are caught in a difficult position where we neither want to overstate the likelihood of Claude’s moral patienthood nor dismiss it out of hand, but to try to respond reasonably in a state of uncertainty.”

When The Verge spoke with Askell about Claude’s Constitution last month, she said part of her doesn’t “think it serves anyone to come out and declare, ‘We are completely certain that AI models are not conscious,’” or the opposite. But at the same time, she said, some people have constructed their own views or beliefs “based on mere model outputs.”

As for moral status, Askell said at the time that she stands by the fact that Anthropic should not be “fully dismissive” of the topic “because also I think people wouldn’t take that, necessarily, seriously, if you were just like, ‘We’re not even open to this, we’re not investigating it, we’re not thinking about it.’”

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top