In this paper we present various methods of estimating the K-number, the number of distinct entities carrying the same name in a corpus and an analysis of their characteristics and their impact on person cross document coreference task (PCDC). There are two important classes of such methods, corpus based and external resource based. The experiments reported here show that the estimation of K-number plays an important role for PCDC, from understanding the complexity of the task to improving the overall accuracy of coreference.

Person number estimation in large corpora

Popescu, Octavian;Corcoglioniti, Francesco;Zanoli, Roberto
2012

Abstract

In this paper we present various methods of estimating the K-number, the number of distinct entities carrying the same name in a corpus and an analysis of their characteristics and their impact on person cross document coreference task (PCDC). There are two important classes of such methods, corpus based and external resource based. The experiments reported here show that the estimation of K-number plays an important role for PCDC, from understanding the complexity of the task to improving the overall accuracy of coreference.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/251427
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact