Knowing the number of different individuals carrying the same name may improve the overall accuracy of a Person Cross Document Coreference System, which processes large corpora and clusters the name mentions according to the individuals carrying them. In this paper we present a series of methods of estimating this number. In particular, an estimation method based on name perplexity, which brings a large improvement over the baseline given by the gap statistics, is instrumental in reaching accurate clustering results because not only it can predict the number of clusters with a very good confidence, but also it can indicate what type of clustering method works best for each particular name.
Scheda prodotto non validato
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte di FBK.
Titolo: | Methods of estimating the number of clusters for person cross document coreference task |
Autori: | |
Data di pubblicazione: | 2012 |
Abstract: | Knowing the number of different individuals carrying the same name may improve the overall accuracy of a Person Cross Document Coreference System, which processes large corpora and clusters the name mentions according to the individuals carrying them. In this paper we present a series of methods of estimating this number. In particular, an estimation method based on name perplexity, which brings a large improvement over the baseline given by the gap statistics, is instrumental in reaching accurate clustering results because not only it can predict the number of clusters with a very good confidence, but also it can indicate what type of clustering method works best for each particular name. |
Handle: | http://hdl.handle.net/11582/251425 |
Appare nelle tipologie: | 4.1 Contributo in Atti di convegno |