Large pre-trained language models have exhibited unprecedented capabilities in producing high-quality text via prompting techniques. This fact introduces new possibilities for data collection and annotation, particularly in situations where such data are scarce, complex to gather, expensive, or even sensitive. In this paper, we explore the potential of pre-trained language models to generate and annotate goal-oriented dialogues, and conduct an in-depth analysis to evaluate their quality. Our experiments employ ChatGPT, and encompass three categories of goal-oriented dialogues (task-oriented, collaborative, and explanatory), two generation modes (interactive and one-shot), and two languages (English and Italian). Through extensive human-based evaluations, we demonstrate that the quality of generated dialogues is on par with those generated by humans. On the other side, we show that the complexity of dialogue annotation schema (e.g., for dialogue state tracking) exceeds the capacity of current language models, a task which still requires substantial human supervision.

Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations

Tiziano Labruna;Sofia Brenna;Andrea Zaninello;Bernardo Magnini
2023-01-01

Abstract

Large pre-trained language models have exhibited unprecedented capabilities in producing high-quality text via prompting techniques. This fact introduces new possibilities for data collection and annotation, particularly in situations where such data are scarce, complex to gather, expensive, or even sensitive. In this paper, we explore the potential of pre-trained language models to generate and annotate goal-oriented dialogues, and conduct an in-depth analysis to evaluate their quality. Our experiments employ ChatGPT, and encompass three categories of goal-oriented dialogues (task-oriented, collaborative, and explanatory), two generation modes (interactive and one-shot), and two languages (English and Italian). Through extensive human-based evaluations, we demonstrate that the quality of generated dialogues is on par with those generated by humans. On the other side, we show that the complexity of dialogue annotation schema (e.g., for dialogue state tracking) exceeds the capacity of current language models, a task which still requires substantial human supervision.
2023
9783031475450
9783031475467
File in questo prodotto:
File Dimensione Formato  
_accepted__AIxIA_2023 (3).pdf

solo utenti autorizzati

Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 428.3 kB
Formato Adobe PDF
428.3 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/346907
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact