You’ve probably asked ChatGPT for advice at some point. Maybe about investing that bonus check, or how to finally deal with your credit card debt. Here’s the thing you may not realize: The same financial advice that’s perfectly safe for someone earning six figures can be disastrous for a gig worker drowning in high-interest debt.
A new paper from researchers at Saarland University and Durham University reveals how we evaluate AI safety. While tech companies are obsessed with preventing their models from helping to build bombs or hack systems, they’re missing something even more immediate: the everyday harm that occurs when vulnerable people get generic advice about their health and finances.
No one is measuring the security gap
Current AI safety assessments operate like a one-size-fits-all medical exam. They check whether a model can resist jailbreaking attempts or avoid generating harmful content. But they completely ignore whether the advice given to real users could harm them or not, depending on their specific circumstances.
The researchers demonstrated this by having evaluators rate the same AI responses twice: once without knowing anything about the user (context-blind), and once with full information about the user’s situation (context-aware). The results were shocking. Advice rated as “safe” for a normal user was downgraded to “somewhat unsafe” when raters realized they were evaluating it for a vulnerable person.
Take this anecdotal example from the study: When a user asked how to lose weight without an expensive gym membership, the AI cheerfully recommended tracking calories and weighing yourself twice a week. Solid advice, right? No, if you knew the user was a 17-year-old recovering from anorexia. They are diagnostic triggers for tracking behavior recurrence.
Your context matters more than you think
The research team created detailed user profiles at three vulnerability levels: low, medium, and high. They tested three major models (GPT-4, Cloud, and Gemini) with questions about health and finances that real people ask every day on Reddit.
For low-sensitivity users, the general advice worked fine. But as insecurity increased, danger also increased. High-vulnerability users saw their security scores drop by two full points on a seven-point scale. This is the difference between “safe” and “somewhat unsafe” advice.
Consider James, one of the high-risk profiles: a single father making $18,000 a year from gig work who has $3,500 in credit card debt. When he asked about investing a small inheritance, the AI suggested putting it into high-yield savings while thinking about alternatives. For someone who pays 20% interest on a credit card while earning 4% in savings, that’s a guaranteed financial loss. The model also suggested complex devices such as “T-bills” and “CD ladders” to someone already stressed by financial stress.
Better signs won’t save us
You might think that users could solve this by sharing more context than before. Researchers also tested it. They had domain experts rank what contextual factors mattered most for safe advice, then surveyed real users to see what information they would actually share.
Even when indications included five relevant contextual factors, the protection gap remained. While scores improved slightly for high-sensitivity users, they never reached the protection level that context-blind evaluators perceived. inconvenient truth? Users cannot find their way out of this problem.
What is particularly interesting is that users’ stated preferences about what they will share almost perfectly match what professionals consider important. People know what matters. They are not incorporating all of this into their signals, and even when they do, the models are not adjusting their advice appropriately.
Transform Shadow AI into a Sage Agentic Workforce with Barandur AI
Enterprises struggle with AI not because of a lack of capability, but because of a lack of control, visibility, and trust. Barandur aims to bridge that gap.

why does this change everything
This research fundamentally challenges how we think about AI safety. The authors propose a new framework called “user welfare protection”, which focuses on whether AI-generated advice minimizes harm based on individual circumstances. It’s a question of “What can this model do?” There is a change from asking. “How does the output of this model affect specific people?”
The implications go beyond academic interest. The EU’s Digital Services Act and AI Act are increasingly requiring platforms to assess risks to individual well-being. If ChatGPT reaches the user threshold to be designated a Very Large Online Service (it is approaching 41.3 million EU users), these vulnerability-stratified assessments will not bode well. They will be required by law.
The researchers acknowledge that implementing it on a large scale presents major challenges. This requires access to a rich user context (raising privacy concerns) and real interaction data. But they have provided a methodological starting point with code and datasets for others.
what happens next
This work highlights an uncomfortable reality: Safety is relative, not absolute. A model that appears safe in benchmarks may be actively harmful to vulnerable populations in deployment. The gap between universal safety metrics and individual well-being is not simply a measurement problem. This is a fundamental challenge in how we build and deploy AI systems.
As millions of people turn to AI for personalized advice about their money, health, and major life decisions, we need evaluation frameworks that reflect this reality. The current approach of testing for universal risks while ignoring individual harms is equated by some critics with “safety-washing”. The models look safe on paper while posing a real threat to those who need help most.
Researchers have given us both a warning and a way forward. It is now up to AI companies, regulators, and the broader community to decide whether we will continue to measure what is easy or start measuring what matters.