The MEANING Italian Corpus (MIC) is a large size corpus of written contemporary Italian, which is being created at ITC-irst, in the framework of the EU-funded MEANING project. Its novelty consists in the fact that domain-representativeness has been chosen as the fundamental criterion for the selection of the texts to be included in the corpus. A core set of 42 basic domains, broadly representative of all the branches of knowledge, has been chosen to be represented in the corpus. The MEANING Italian corpus will be encoded using XML and taking into account, whenever possible according to the requirements of our NLP applications, che XML version of the Corpus Encoding Standard (XCES) and the new standard ISO/TC 37/SC 4 for language resources. A multi-level annotation is planned in order to encode seven different kinds of information: orthographic features, the structure of the text, morphosyntactic information, multiwords, syntactic information, named entities, and word senses

The MEANING Italian Corpus

Bentivogli, Luisa;Girardi, Christian;Pianta, Emanuele
2003-01-01

Abstract

The MEANING Italian Corpus (MIC) is a large size corpus of written contemporary Italian, which is being created at ITC-irst, in the framework of the EU-funded MEANING project. Its novelty consists in the fact that domain-representativeness has been chosen as the fundamental criterion for the selection of the texts to be included in the corpus. A core set of 42 basic domains, broadly representative of all the branches of knowledge, has been chosen to be represented in the corpus. The MEANING Italian corpus will be encoded using XML and taking into account, whenever possible according to the requirements of our NLP applications, che XML version of the Corpus Encoding Standard (XCES) and the new standard ISO/TC 37/SC 4 for language resources. A multi-level annotation is planned in order to encode seven different kinds of information: orthographic features, the structure of the text, morphosyntactic information, multiwords, syntactic information, named entities, and word senses
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/2004
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact