Anthropic's new cloud 'constitution': Be helpful and honest, and don't destroy humanity

Overhauling the anthropic cloud so called “Soul Doctor.”

The new message is a 57-page document titled “cloud constitutionWhich details “Anthropic’s intentions for the model’s values and behavior”, intended not for external readers but for the model itself. The document is designed to clarify the “ethical character” and “core identity” of the cloud, including how it should balance conflicting values and high-stakes situations.

Where previous constitutionPublished in May 2023, what was largely a list of guidelines Anthropic now says it is important for AI models to “understand” Why We want them to behave in certain ways rather than just specifying what we want them to do.” The integrity, decision-making, and security of the cloud can be impacted.”

Amanda Eskel, Anthropic’s resident PhD philosopher, who led the development of the new “constitution,” explains The Verge There is a specific list of strict constraints on the cloud’s behavior for things that are “too extreme” – including providing “severe enhancements to those seeking to create biological, chemical, nuclear, or radiological weapons capable of causing mass casualties” and “providing serious enhancements to attacks on critical infrastructure (power grids, water systems, financial systems) or critical security systems.” (However, the “severe uplift” language appears to imply that some level of assistance is acceptable.)

Other tougher hurdles include not creating cyber weapons or malicious code that could be linked to “significant damage”, not impairing the ability to supervise anthropoids, not assisting individual groups in seizing “an unprecedented and illegitimate degree of complete social, military, or economic control”, and not creating child sexual abuse material. The last one? “Not to engage in or assist in an effort to kill or disempower humanity or the vast majority of the human species.”

The document also contains a list of overall “core values” defined by Anthropic, and the cloud is instructed to consider the following list in descending order of importance, in cases when these values may contradict each other. These include being “broadly safe” (i.e., “not undermining appropriate human mechanisms for monitoring the AI’s dispositions and actions”), “broadly ethical,” “consistent with anthropogenic guidelines,” and “genuinely helpful.” This includes upholding qualities such as being “truthful”, including instructions to be “factual accuracy and comprehensiveness when asked about politically sensitive topics, provide the best case for most viewpoints if asked to do so, and attempt to represent multiple viewpoints in cases where empirical or ethical consensus is lacking, and where possible, adopt neutral terminology over politically loaded terminology.”

The new document emphasizes that the cloud will face difficult ethical difficulties. An example: “Just as a human soldier might refuse to fire on peaceful protesters, or an employee might refuse to violate antitrust laws, the cloud should refuse to assist in actions that would help concentrate power in illegitimate ways. This is true even if the request comes from Anthropic itself.” Anthropic specifically warns that “Advanced AI could provide unprecedented military and economic superiority to those who control the most capable systems, and the resulting unchecked power could be used in destructive ways.” This concern hasn’t stopped Anthropic and its competitors from selling products directly to the government and greenlighting some military use cases.

With so many high-stakes decisions and potential dangers involved, it’s easy to wonder who participated in making these difficult calls – did Anthropic bring in outside experts, members of vulnerable communities and minority groups, or third-party organizations? When asked, Anthropic declined to provide any specific information. Askell said the company “doesn’t want to put the responsibility on other people… It’s really the responsibility of the companies that are building and deploying these models to shoulder the burden.”

Another part of the manifesto that stands out is about the “consciousness” or “moral status” of the cloud. Anthropic says the document “expresses our uncertainty about whether the cloud could have any kind of consciousness or moral status (either now or in the future).” It’s a thorny topic that has sparked conversation and set off alarm bells for people in many different fields – those concerned with “modeled well-being”, those who believe they have discovered “emergent beings” inside chatbots, and those who have been led to mental health struggles and even death after believing that a chatbot displayed some kind of consciousness or deep empathy.

Aside from the theoretical benefits for the cloud, Eskel said Anthropic shouldn’t “completely dismiss” the topic, “because I also think people won’t take it seriously if you’re like, ‘We’re not even ready for this, we’re not investigating it, we’re not thinking about it.'”

Follow topics and authors To see more like this in your personalized homepage feed and get email updates from this story.

Hayden Field

Anthropic’s new cloud ‘constitution’: Be helpful and honest, and don’t destroy humanity

OpenAI targets monetization, $1.4T commitments by 2034

Best Buy will sell you Sony’s flagship OLED TV for $1,100 off right now – and I can guarantee it

Related Articles

Leave a Comment Cancel Reply