AI’s Next Safety Test: looking beyond performative protections

November 17, 2025

Insights

By Dr Michael D’Rosario

When OpenAI announced its latest wave of safety improvements in ChatGPT’s mental-health responses, the company framed the development as a major ethical advance. Working with more than 170 clinicians worldwide, it reported that the model now handles conversations about psychosis, mania, self-harm, and emotional dependence with up to 80 per cent fewer “undesired responses.” The company spoke of compassion, empathy, and progress in promoting the latest iteration of their LLM. These changes do matter, and it would be both unfair and unhelpful to not acknowledge them. They are evidence of what is possible.

In my previous article, I noted concerns that many have for the safety of the vulnerable, particularly children. The changes OpenAI have promoted do not allay concerns or arrest the very real problems, but they are evidence of progress, and I hope, a more self-aware approach to digital governance and risk oversight. However, they also expose a troubling absence.

The focus on internalised distress should not be the only domain for our attention when developing this technology. Psychological crisis, suicidal ideation, emotional reliance, are incredibly important, but focusing exclusively on these domains would reflect a narrow view of human vulnerability.

Are tech companies willing to protecting our most vulnerable? Photo: Rob Lach via Pexels

LLM producers must also consider broader risk considerations, in line with the strong body of research pertaining to physical risk, emergent from poor experience design. The recent safety improvements do little for those whose danger comes not from within, but from the people and environments around them.

For the millions living under coercive control or the threat of domestic violence, LLMs now present new risks. In homes where digital surveillance and control are routine, a chat window with an LLM may serve as a source of essential knowledge and respite, but also as a looming threat. Someone seeking help might type “how do I leave safely?” only to find the exchange saved in a visible sidebar, retrievable by anyone sharing the device.

There is no “safe-exit” button, no instant redirection to a neutral site (such as the Bureau of Meteorology) or a news page. Nor is there an opt-out from automatic storage or indexing of sensitive conversations. The very interface that promises discreetness relative to direct searches can, under certain circumstances, betray it.

This is not conjecture. Research in human–computer interaction has long identified design patterns that can mitigate such risks. Turk and Hutchings (2023) found that quick-exit buttons are common across domestic-violence support sites. Orji et al. (2022) demonstrated the effectiveness of a co-designed “Panic Button” for domestic-violence apps, while Lee and Lee (2024) confirmed the success of similar emergency reporting features for sexual-violence victims.

LLMs can pose risks for those seeking help in DV situations. Photo: RDNE Stock Photos via Pexels

Berg (2015) noted that online forums already recognise the need for fast concealment. The tools exist; the enabling mechanisms are easily built, and yet, there appears to be little to no research released by the LLM producers. Perhaps what is missing is genuine will.

OpenAI’s own Affective Use and Emotional Well-being Study (2025) deepens the contradiction. The research confirms that users often develop emotional attachment and dependency on ChatGPT, particularly through voice interaction. It acknowledges that the platform influences users’ social behaviour and well-being. Yet the company has not applied the same empirical research efforts or due diligence to contexts of physical danger, or coercive control. It measures attachment but not exposure; affect but not risk. Despite recognising emotional fragility, the product design remains indifferent to situational safety.

This neglect is not a failure of imagination, but of ethical balance, a balance I first questioned in my earlier work, The Wrong Balance in AI: Protecting Machines While Failing People. There, I argued that contemporary AI ethics has become preoccupied with protecting the technology from reputational or regulatory harm, while neglecting to protect the humans who use it.

OpenAI’s mental-health progress demonstrates that where corporate liability and public relations are concerned, improvement is rapid and measurable. Yet when the risk involves coercive control, a domain where harm is harder to quantify but no less real; the same urgency disappears.

The absence of a safe-exit mechanism, opt-in chat retention, and transparent reporting of domestic-violence-related interactions reflects a policy choice, not a technical limitation. The same infrastructure that now classifies and mitigates suicidal ideation could classify and conceal conversations signalling coercion or fear. If we can train a model to direct users to mental-health hotlines, we can train it to detect and hide conversations that may endanger a person if seen by another.

Safe-Exit Mechanisms should be incorporated into AI chat bots. Photo by Pixabay, via Pexels

Transparency is the next test of ethical seriousness. OpenAI and its peers must disclose how often users seek guidance on coercive control, domestic violence, or related safety issues, and how their systems respond. Are these interactions detected? Are they protected from indexing? Do they trigger referrals to verified support services? Without such data, there is no basis for assessing whether AI platforms are mitigating harm or compounding it.

The industry must treat concealment and deletion as standard safety features. A quick-exit button should redirect to a neutral page and clear all traces of the session. Sensitive exchanges should default to non-indexed status, with users explicitly opting in to retention. Service referral data should be maintained and audited for accuracy, ensuring that links to crisis helplines or support services are current and regionally appropriate.

The issue is not one of feasibility, but of moral clarity. If a system can reduce unsafe outputs by 80 per cent in mental-health contexts, it can just as easily reduce exposure risk for those facing violence at home. What is required is intent, the decision to expand “safety” beyond psychological risk containment to include physical and situational protection.

As conversational AI becomes woven into education, healthcare, and government service delivery, these systems inherit a duty of care proportionate to their reach. The next stage of ethical progress demands trauma-informed design, mandatory disclosure, and enforceable standards for concealment, deletion, and referral accuracy. Data provision is also paramount, we know little about how users are engaging with the platforms, and consequently mandates must be put in place pertaining to how these systems are directing high risk conversations, and the outcomes of such conversations. This data is critical to ensuring that innovation is ‘safely’ deployed. It will help us understand how LLMs are shaping complex ‘triage’ like moments.

Until that happens, AI’s progress on safety will remain partial, and performative. Machines will be protected by design, while the people who use them in fear will remain unprotected by neglect.

D’Rosario (2025), AI’s next big safety test, looking beyond performative protections, Insights Series, Per Capita Australia