C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing the four main Romance languages: French, Italian, Portuguese and Spanish. The resource will be delivered in standard textual format, aligned to the audio source in a multimedia edition. C-ORAL-ROM aims to ensure both a sufficient representation of sopontaneous speech variatin in each language resource, and comparability among the four resources with respect to a definite set of variation parameters. The multimedia conception of C-ORAL-ROM allows simultaneously aligment and full appreciation of the acoustic information through the speech software WINPITCHCORPUS. The storage of spoken language resources is based on the identification of utterances in the four corpora through perceptively relevant prosodic properties. In C-ORAL-ROM, all the textual information is tagged simultaneously with respect to prosodic parsing and utterance limits. Each prosodic unit corresponding to an utterance is easily and directly aligend to its acoustic counterpart, thus ensuring a natural text-sound correspondence and the definition of a data base of possible speech acts in the four romance language

The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus

Falavigna, Giuseppe Daniele;
2002-01-01

Abstract

C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing the four main Romance languages: French, Italian, Portuguese and Spanish. The resource will be delivered in standard textual format, aligned to the audio source in a multimedia edition. C-ORAL-ROM aims to ensure both a sufficient representation of sopontaneous speech variatin in each language resource, and comparability among the four resources with respect to a definite set of variation parameters. The multimedia conception of C-ORAL-ROM allows simultaneously aligment and full appreciation of the acoustic information through the speech software WINPITCHCORPUS. The storage of spoken language resources is based on the identification of utterances in the four corpora through perceptively relevant prosodic properties. In C-ORAL-ROM, all the textual information is tagged simultaneously with respect to prosodic parsing and utterance limits. Each prosodic unit corresponding to an utterance is easily and directly aligend to its acoustic counterpart, thus ensuring a natural text-sound correspondence and the definition of a data base of possible speech acts in the four romance language
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/197
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact