In this paper we present various methods of estimating the K-number, the number of distinct entities carrying the same name in a corpus and an analysis of their characteristics and their impact on person cross document coreference task (PCDC). There are two important classes of such methods, corpus based and external resource based. The experiments reported here show that the estimation of K-number plays an important role for PCDC, from understanding the complexity of the task to improving the overall accuracy of coreference.
Person number estimation in large corpora
Popescu, Octavian;Corcoglioniti, Francesco;Zanoli, Roberto
2012-01-01
Abstract
In this paper we present various methods of estimating the K-number, the number of distinct entities carrying the same name in a corpus and an analysis of their characteristics and their impact on person cross document coreference task (PCDC). There are two important classes of such methods, corpus based and external resource based. The experiments reported here show that the estimation of K-number plays an important role for PCDC, from understanding the complexity of the task to improving the overall accuracy of coreference.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.