The rapid integration of conversational agents into digital mental health has outpaced the development of clinical governance frameworks. While chatbots increasingly serve as primary support tools, they lack standardized protocols for detecting high-risk user disclosures, leaving users vulnerable to underpowered interventions. To address this safety gap, this study aimed to identify specific signs of mental distress or harmful intent that mandate active monitoring during chatbot interactions. We proposed that the governance of such systems must be grounded in two foundational clinical pillars: mental health triage for immediate risk stratification and stepped care for hierarchical intervention. We employed a two-round eDelphi design with a purposive sample of 52 experts in clinical psychology, medicine, and human-computer interaction. In the first round, panelists evaluated a preliminary list of risk areas derived from literature, suggesting modifications and expanding the list to ensure clinical comprehensiveness, before prioritizing the areas based on severity. The second round focused on refining the final list and, uniquely, mapping each validated area to a minimum necessary intervention level within a stepped-care model. The experts validated a final framework of 14 critical areas, fundamentally shifting risk monitoring from diagnostic labels to a symptom-based logic that aligns with the non-clinical capabilities of natural language processing. Beyond identifying what to monitor, the study established how systems should respond: experts mandated that high-acuity presentations, such as active suicidal intent or abuse, require immediate redirection to human services, while lower-acuity concerns, including social isolation and mild anxiety, were deemed suitable for autonomous management via self-help techniques or empathic listening. By grounding chatbot architecture in these clinical pillars, these findings provide a blueprint for safer automation where conversational agents act as complementary tools capable of autonomously managing mild distress while serving as effective triage points for severe pathology. Future research should replicate and validate this framework with international and culturally diverse expert panels, explore its technical implementation in NLP architectures, and evaluate its clinical impact through real-world deployment in existing digital mental health interventions.

Identifying critical areas for safer mental health chatbot interactions: a eDelphi study on high-risk content monitoring

Marco Bolpagni
;
Valentina Fietta;Silvia Gabrielli
2026-01-01

Abstract

The rapid integration of conversational agents into digital mental health has outpaced the development of clinical governance frameworks. While chatbots increasingly serve as primary support tools, they lack standardized protocols for detecting high-risk user disclosures, leaving users vulnerable to underpowered interventions. To address this safety gap, this study aimed to identify specific signs of mental distress or harmful intent that mandate active monitoring during chatbot interactions. We proposed that the governance of such systems must be grounded in two foundational clinical pillars: mental health triage for immediate risk stratification and stepped care for hierarchical intervention. We employed a two-round eDelphi design with a purposive sample of 52 experts in clinical psychology, medicine, and human-computer interaction. In the first round, panelists evaluated a preliminary list of risk areas derived from literature, suggesting modifications and expanding the list to ensure clinical comprehensiveness, before prioritizing the areas based on severity. The second round focused on refining the final list and, uniquely, mapping each validated area to a minimum necessary intervention level within a stepped-care model. The experts validated a final framework of 14 critical areas, fundamentally shifting risk monitoring from diagnostic labels to a symptom-based logic that aligns with the non-clinical capabilities of natural language processing. Beyond identifying what to monitor, the study established how systems should respond: experts mandated that high-acuity presentations, such as active suicidal intent or abuse, require immediate redirection to human services, while lower-acuity concerns, including social isolation and mild anxiety, were deemed suitable for autonomous management via self-help techniques or empathic listening. By grounding chatbot architecture in these clinical pillars, these findings provide a blueprint for safer automation where conversational agents act as complementary tools capable of autonomously managing mild distress while serving as effective triage points for severe pathology. Future research should replicate and validate this framework with international and culturally diverse expert panels, explore its technical implementation in NLP architectures, and evaluate its clinical impact through real-world deployment in existing digital mental health interventions.
File in questo prodotto:
File Dimensione Formato  
Identifying critical areas for safer mental health chatbot.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 3.04 MB
Formato Adobe PDF
3.04 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/370567
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact