The swift advancement of Large Language Models (LLMs) has led to their widespread use across various tasks and domains, demonstrating remarkable generalization capabilities. However, achieving optimal performance in specialized tasks often requires fine-tuning LLMs with task-specific resources. The creation of high-quality, human-annotated datasets for this purpose is challenging due to financial constraints and the limited availability of human experts. To address these limitations, we propose First-AID, a novel human-in-theloop (HITL) data collection framework for the knowledge-driven generation of synthetic dialogues using LLM prompting. In particular, our framework implements different strategies of data collection that require different user intervention during dialogue generation to reduce post-editing efforts and enhance the quality of generated dialogues. We also evaluated First-AID on misinformation and hate countering dialogues collection, demonstrating (1) its potential for efficient and high-quality data generation and (2) its adaptability to different practical constraints thanks to the three data collection strategies.
First-AID: the first Annotation Interface for grounded Dialogues
Menini, Stefano;Russo, Daniel;Aprosio, Alessio Palmero;Guerini, Marco
2025-01-01
Abstract
The swift advancement of Large Language Models (LLMs) has led to their widespread use across various tasks and domains, demonstrating remarkable generalization capabilities. However, achieving optimal performance in specialized tasks often requires fine-tuning LLMs with task-specific resources. The creation of high-quality, human-annotated datasets for this purpose is challenging due to financial constraints and the limited availability of human experts. To address these limitations, we propose First-AID, a novel human-in-theloop (HITL) data collection framework for the knowledge-driven generation of synthetic dialogues using LLM prompting. In particular, our framework implements different strategies of data collection that require different user intervention during dialogue generation to reduce post-editing efforts and enhance the quality of generated dialogues. We also evaluated First-AID on misinformation and hate countering dialogues collection, demonstrating (1) its potential for efficient and high-quality data generation and (2) its adaptability to different practical constraints thanks to the three data collection strategies.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025.acl-demo.54.pdf
accesso aperto
Licenza:
Creative commons
Dimensione
758.21 kB
Formato
Adobe PDF
|
758.21 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
