Large pre-trained language models have exhibited unprecedented capabilities in producing high-quality text via prompting techniques. This fact introduces new possibilities for data collection and annotation, particularly in situations where such data are scarce, complex to gather, expensive, or even sensitive. In this paper, we explore the potential of pre-trained language models to generate and annotate goal-oriented dialogues, and conduct an in-depth analysis to evaluate their quality. Our experiments employ ChatGPT, and encompass three categories of goal-oriented dialogues (task-oriented, collaborative, and explanatory), two generation modes (interactive and one-shot), and two languages (English and Italian). Through extensive human-based evaluations, we demonstrate that the quality of generated dialogues is on par with those generated by humans. On the other side, we show that the complexity of dialogue annotation schema (e.g., for dialogue state tracking) exceeds the capacity of current language models, a task which still requires substantial human supervision.
Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations
Tiziano Labruna;Sofia Brenna;Andrea Zaninello;Bernardo Magnini
2023-01-01
Abstract
Large pre-trained language models have exhibited unprecedented capabilities in producing high-quality text via prompting techniques. This fact introduces new possibilities for data collection and annotation, particularly in situations where such data are scarce, complex to gather, expensive, or even sensitive. In this paper, we explore the potential of pre-trained language models to generate and annotate goal-oriented dialogues, and conduct an in-depth analysis to evaluate their quality. Our experiments employ ChatGPT, and encompass three categories of goal-oriented dialogues (task-oriented, collaborative, and explanatory), two generation modes (interactive and one-shot), and two languages (English and Italian). Through extensive human-based evaluations, we demonstrate that the quality of generated dialogues is on par with those generated by humans. On the other side, we show that the complexity of dialogue annotation schema (e.g., for dialogue state tracking) exceeds the capacity of current language models, a task which still requires substantial human supervision.File | Dimensione | Formato | |
---|---|---|---|
_accepted__AIxIA_2023 (3).pdf
solo utenti autorizzati
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
428.3 kB
Formato
Adobe PDF
|
428.3 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.