Integrating emotional intelligence into AI systems is essential for developing empathetic chatbots, yet deploying fully empathetic models is often constrained by business, ethical, and computational factors. We propose an innovative solution: a dedicated empathy rephrasing layer that operates downstream of a chatbot’s initial response. This layer leverages large language models (LLMs) to infuse empathy into the chatbot’s output without altering its core meaning, thereby enhancing emotional intelligence and user engagement. To implement this layer, we extend and validate the IDRE (Italian Dialogue for Empathetic Responses) dataset. We evaluated small- and medium-scale LLMs across three configurations: baseline models, models augmented via few-shot learning with IDRE exemplars, and models fine-tuned on IDRE. Performance was quantitatively assessed using the LLM-as-a-judge paradigm, leveraging custom metrics. These results were further validated through an independent human evaluation and supported by established NLP similarity metrics, ensuring a robust triangulation of findings. Results confirm that both few-shot prompting and fine-tuning with IDRE significantly enhance the models’ capacity for empathetic language generation. Applications include empathetic AI in healthcare, such as virtual assistants for patient support, and demonstrate promising generalization to other domains. All datasets, prompts, fine-tuned models, and scripts are publicly available to ensure transparency and reproducibility.

The IDRE Dataset in Practice: Training and Evaluation of Small-to-Medium-Sized LLMs for Empathetic Rephrasing

Simone Manai;Roberto Zanoli;Alberto Lavelli
2025-01-01

Abstract

Integrating emotional intelligence into AI systems is essential for developing empathetic chatbots, yet deploying fully empathetic models is often constrained by business, ethical, and computational factors. We propose an innovative solution: a dedicated empathy rephrasing layer that operates downstream of a chatbot’s initial response. This layer leverages large language models (LLMs) to infuse empathy into the chatbot’s output without altering its core meaning, thereby enhancing emotional intelligence and user engagement. To implement this layer, we extend and validate the IDRE (Italian Dialogue for Empathetic Responses) dataset. We evaluated small- and medium-scale LLMs across three configurations: baseline models, models augmented via few-shot learning with IDRE exemplars, and models fine-tuned on IDRE. Performance was quantitatively assessed using the LLM-as-a-judge paradigm, leveraging custom metrics. These results were further validated through an independent human evaluation and supported by established NLP similarity metrics, ensuring a robust triangulation of findings. Results confirm that both few-shot prompting and fine-tuning with IDRE significantly enhance the models’ capacity for empathetic language generation. Applications include empathetic AI in healthcare, such as virtual assistants for patient support, and demonstrate promising generalization to other domains. All datasets, prompts, fine-tuned models, and scripts are publicly available to ensure transparency and reproducibility.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/367488
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact