This paper presents work aimed at the realization of a gold standard for cross-document coreference resolution of person entities in a corpus of Italian news. The gold standard has been created selecting a number of person names occurring in Adige-500K, a corpus composed of all the news stories published by the local newspaper `L`Adige` from 1999 to 2006. The corpus consists of 535,000 news stories, for a total of around 200 million tokens.To sample the person names in the corpus, we identified two dimensions, corresponding to two phenomena we intended to study, namely (i) the fame of the person entities and (ii) the ambiguity of person names. The first version of the gold standard is composed of 209 person names corresponding to 709 entities, for a total of 43,704 annotated documents.
Creating a Gold Standard for Person Cross-Document Coreference Resolution in Italian News
Bentivogli, Luisa;Girardi, Christian;Pianta, Emanuele
2008-01-01
Abstract
This paper presents work aimed at the realization of a gold standard for cross-document coreference resolution of person entities in a corpus of Italian news. The gold standard has been created selecting a number of person names occurring in Adige-500K, a corpus composed of all the news stories published by the local newspaper `L`Adige` from 1999 to 2006. The corpus consists of 535,000 news stories, for a total of around 200 million tokens.To sample the person names in the corpus, we identified two dimensions, corresponding to two phenomena we intended to study, namely (i) the fame of the person entities and (ii) the ambiguity of person names. The first version of the gold standard is composed of 209 person names corresponding to 709 entities, for a total of 43,704 annotated documents.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.