Digital health interventions often require structured, protocol-driven dialogues delivered with high fidelity. Evaluating whether an agent employing a Large Language Model (LLM) can meet these requirements remains challenging, especially in early development stages. In this work, we present VALISE (Virtual Agent Laboratory for Instruction-Following Simulation and Evaluation), a modular framework for simulating and evaluating LLM agent behavior in delivering structured health interventions. VALISE enables configurable agent–patient simulations using synthetic personas and evaluates protocol adherence through a customizable, automated grid assessed by ensembles of LLM-based judges. We demonstrate its use with Brief Action Planning (BAP), a short intervention promoting behavior change in sedentary individuals. Our results strongly align LLM-based and expert annotations, supporting VALISE’s effectiveness for early-stage evaluations. VALISE offers a reproducible, extensible platform for testing instruction-following capabilities of LLM agents in digital health.
VALISE: A Virtual Agent Laboratory for Instruction-Following Simulation and Evaluation of LLM-Powered Digital Health Interventions
Marco Bolpagni
;Simone De Carli;Leonardo Sanna;Mauro Dragoni;Silvia Gabrielli
2025-01-01
Abstract
Digital health interventions often require structured, protocol-driven dialogues delivered with high fidelity. Evaluating whether an agent employing a Large Language Model (LLM) can meet these requirements remains challenging, especially in early development stages. In this work, we present VALISE (Virtual Agent Laboratory for Instruction-Following Simulation and Evaluation), a modular framework for simulating and evaluating LLM agent behavior in delivering structured health interventions. VALISE enables configurable agent–patient simulations using synthetic personas and evaluates protocol adherence through a customizable, automated grid assessed by ensembles of LLM-based judges. We demonstrate its use with Brief Action Planning (BAP), a short intervention promoting behavior change in sedentary individuals. Our results strongly align LLM-based and expert annotations, supporting VALISE’s effectiveness for early-stage evaluations. VALISE offers a reproducible, extensible platform for testing instruction-following capabilities of LLM agents in digital health.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
