Nowadays, surfing the Web and looking for persons seems to be one of the most common activities of Internet users. However, person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a cross-document coreference system able to identify person names in different documents which refer to the same person entity is presented. The system exploits background knowledge through two mechanisms: (1) the use of a dynamic similarity threshold for clustering person names, which depends on the ambiguity of the name estimated using a phonebook; and (2) the disambiguation of names against a knowledge base containing person descriptions, using an entity linking system and including its output as an additional feature for computing similarity. The paper describes the system and reports its performance tested taking part in the News People Search (NePS) task at Evalita 2011. A version of the system is being used in a real-word application, which requires to corefer millions of names from multimedia sources.

Exploiting Background Knowledge for Clustering Person Names

Zanoli, Roberto;Corcoglioniti, Francesco;Girardi, Christian
2013-01-01

Abstract

Nowadays, surfing the Web and looking for persons seems to be one of the most common activities of Internet users. However, person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a cross-document coreference system able to identify person names in different documents which refer to the same person entity is presented. The system exploits background knowledge through two mechanisms: (1) the use of a dynamic similarity threshold for clustering person names, which depends on the ambiguity of the name estimated using a phonebook; and (2) the disambiguation of names against a knowledge base containing person descriptions, using an entity linking system and including its output as an additional feature for computing similarity. The paper describes the system and reports its performance tested taking part in the News People Search (NePS) task at Evalita 2011. A version of the system is being used in a real-word application, which requires to corefer millions of names from multimedia sources.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/105802
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact