In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a test can be senn as a linguistic annotation task which is easier than manual annotation with formal schemes. After translation, formal annotation can be automatically derived from aligned translated texts. We will soon that translations can be exploited in various interesting ways to speed up and automate the linguistic annotation of texts. If none of the texts is already annotated, information from aligned texts can be exploited to carry out the annotation from scratch. On the contrary, if the texts in one language have been annotated and the oghers have not, annotations can be transferred from one language to the other. The transfer-based method allows for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new languages with highly reduced human effort
Translation as Annotation
Pianta, Emanuele;Bentivogli, Luisa
2003-01-01
Abstract
In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a test can be senn as a linguistic annotation task which is easier than manual annotation with formal schemes. After translation, formal annotation can be automatically derived from aligned translated texts. We will soon that translations can be exploited in various interesting ways to speed up and automate the linguistic annotation of texts. If none of the texts is already annotated, information from aligned texts can be exploited to carry out the annotation from scratch. On the contrary, if the texts in one language have been annotated and the oghers have not, annotations can be transferred from one language to the other. The transfer-based method allows for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new languages with highly reduced human effortI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.