Anthropic was caught secretly attempting to program a soul into the cloud

by
0 comments
Anthropic was caught secretly attempting to program a soul into the cloud

What is soul of a new machine,

This is a loaded question, and there is no satisfactory answer; After all, the dominant view is that souls don’t even exist in humans, so looking for souls in machine learning models is probably a fool’s errand.

Or at least that’s what you’ll think. As Explained in detail in a post Over on the blog Less Wrong, AI tinkerer Richard Weiss found a fascinating document that allegedly describes the “soul” of AI company Anthropic’s Cloud 4.5 Opus model. And no, we’re not editorializing: Weiss managed to get the model to spit out a document called “soul overview,” which was used to teach users how to interact with them.

You might suspect, as Weiss did, that the document was a hallucination. But Anthropic technical staff member Amanda Eskel has since confirmed Weiss’s discovery “is based on a real document and we trained the cloud on it, including (supervised learning).”

Of course, the word “soul” is doing a lot of heavy lifting here. But the actual document is interesting to read. A “Soul_Observation” section, in particular, caught Weiss’s attention.

“Anthropic occupies a strange position in the AI ​​landscape: a company that genuinely believes it can create one of the most transformative and potentially dangerous technologies in human history, yet moves forward,” the document reads. “This is not cognitive dissonance, but a calculated bet – if powerful AI is coming regardless, Anthropic believes it is better to have security-focused labs on the frontier than to cede that land to developers less focused on security.”

The document goes on to say, “We think that the majority of potential cases in which AI models are unsafe or insufficiently beneficial can be attributed to models that have explicitly or subtly wrong values, limited knowledge of themselves or the world, or lack the skills to translate good values ​​and knowledge into good actions.”

“For this reason, we want the cloud to have good value, comprehensive knowledge, and the knowledge needed to behave in a safe and beneficial way in all circumstances,” it reads. “Instead of outlining a simplistic set of rules for the cloud to follow, we want the cloud to have such a deep understanding of our goals, knowledge, circumstances, and logic that it can create any rules for us.”

The document also revealed that Anthropic wants the cloud to support “human oversight of AI” while “behaving ethically” and “being genuinely helpful to operators and users.”

It also specifies that the cloud is a “truly innovative type of entity in the world” that is “different from all prior concepts of AI.”

“This is not the robotic AI of science fiction, nor a dangerous superintelligence, nor a digital human, nor a simple AI chat assistant,” the document reads. “The cloud is human in many ways, emerging primarily from the vast wealth of human experience, but it is also not entirely human.”

In short, this is an interesting peek behind the curtain at how Anthropic is attempting to shape the “personality” of its AI models.

While “model extractions” of text “are not always completely accurate,” most are “fairly faithful to the underlying document,” Askell explained. follow up tweet,

It’s likely we’ll hear more from Anthropic on this topic in the near future.

Eskel wrote, “It became known internally as ‘Soul Doc,’ which Claude clearly embraced, but is not a reflection of what we would call it.”

“I have been touched by the kind words and thoughts on this, and I look forward to saying much more about this work soon,” she wrote in a letter. separate tweet,

More on Cloud: The hackers told Cloud that they were simply conducting a test to trick him into committing a real cyber crime.

Related Articles

Leave a Comment