One of the most controversial issues in the AI industry last year was what to do when a user exhibits signs of mental health struggles in a chatbot conversation. OpenAI’s head of that type of security research, Andrea Vallone, has now joined Anthropic.
“Over the past year, I led OpenAI’s research on a question with almost no established precedent: How should models respond when faced with signs of emotional overdependence or early signs of mental health crisis?” Valon wrote in a LinkedIn post months ago.
Vallone, who spent three years at OpenAI and built the “Model Policy” research team there, worked on how to best deploy GPT-4, OpenAI’s reasoning model, and GPT-5, as well as developing training processes for some of the AI industry’s most popular security techniques, such as rule-based rewards. Now, she has joined the Alignment team at Anthropic, a group tasked with understanding and addressing the biggest risks to AI models.
Vallone will serve under OpenAI security research head John Leakey, who left the company in May 2024 concerns OpenAI’s “security culture and processes transcend shiny products.”
Leading AI startups have increasingly stirred controversy over the past year over users’ struggles with mental health, which could deepen once trust in AI chatbots grows, especially since safety guardrails are broken in long conversations. Some teenagers have died by suicide, or have been murdered by adults after believing the devices were used. Several families have filed wrongful death lawsuits, and at least one Senate subcommittee has hearing On this topic. Security researchers have been tasked with solving the problem.
Sam Bowman, a leader on the alignment team, wrote in a LinkedIn post that he is “proud of how seriously Anthropic is taking the problem of figuring out how AI systems should behave.”
one in LinkedIn post on ThursdayVallone wrote that she is “looking forward to continuing her research at Anthropic, focusing on alignment and fine-tuning to shape the cloud’s behavior in new contexts.”