In this paper we describe the methodological assumptions, general architectural framework and annotation and encoding practices underlying the ADAM Corpus, which has been developed as part of the Italian national project SI-TAL. Each of the 450 dialogues is represented by an orthographic transcription and is annotated at five levels of linguistic information, namely prosody, pos tagging, syntax, semantics, and pragmatics. A coherent, unitary approach to design and application of annotation schemes was pursued across all annotation levels. Particular attention was paid in developing the schemes in order to be consistent with criteria of robustness, wide coverage and compliance with existing standards. The evaluation of the annotation revealed a high degree of either inter-annotator agreement and annotation accuracy, with very promising results for what concerns the usability of the annotation schemes proposed and the accuracy of the annotation applied to the corpus. The ADAM Corpus also represents an interesting experiment at the architectural design level, as the way in which the annotation is organized and structured, as well as represented in a given physical format, aims at maximizing further reusability of the annotated material in terms of wide circulability of the corpus across different annotation practices and research purposes

ADAM: The SI-TAL Corpus of Annotated Dialogues

Cattoni, Roldano;Sandrini, Vanessa;
2002-01-01

Abstract

In this paper we describe the methodological assumptions, general architectural framework and annotation and encoding practices underlying the ADAM Corpus, which has been developed as part of the Italian national project SI-TAL. Each of the 450 dialogues is represented by an orthographic transcription and is annotated at five levels of linguistic information, namely prosody, pos tagging, syntax, semantics, and pragmatics. A coherent, unitary approach to design and application of annotation schemes was pursued across all annotation levels. Particular attention was paid in developing the schemes in order to be consistent with criteria of robustness, wide coverage and compliance with existing standards. The evaluation of the annotation revealed a high degree of either inter-annotator agreement and annotation accuracy, with very promising results for what concerns the usability of the annotation schemes proposed and the accuracy of the annotation applied to the corpus. The ADAM Corpus also represents an interesting experiment at the architectural design level, as the way in which the annotation is organized and structured, as well as represented in a given physical format, aims at maximizing further reusability of the annotated material in terms of wide circulability of the corpus across different annotation practices and research purposes
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/659
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact