Nowadays, surfing the Web and looking for persons seems to be one of the most common activities of Internet users. However, person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a cross-document coreference system able to identify person names in different documents which refer to the same person entity is presented. The system exploits background knowledge through two mechanisms: (1) the use of a dynamic similarity threshold for clustering person names, which depends on the ambiguity of the name estimated using a phonebook; and (2) the disambiguation of names against a knowledge base containing person descriptions, using an entity linking system and including its output as an additional feature for computing similarity. The paper describes the system and reports its performance tested taking part in the News People Search (NePS) task at Evalita 2011. A version of the system is being used in a real-word application, which requires to corefer millions of names from multimedia sources.
Exploiting Background Knowledge for Clustering Person Names
Zanoli, Roberto;Corcoglioniti, Francesco;Girardi, Christian
2013-01-01
Abstract
Nowadays, surfing the Web and looking for persons seems to be one of the most common activities of Internet users. However, person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a cross-document coreference system able to identify person names in different documents which refer to the same person entity is presented. The system exploits background knowledge through two mechanisms: (1) the use of a dynamic similarity threshold for clustering person names, which depends on the ambiguity of the name estimated using a phonebook; and (2) the disambiguation of names against a knowledge base containing person descriptions, using an entity linking system and including its output as an additional feature for computing similarity. The paper describes the system and reports its performance tested taking part in the News People Search (NePS) task at Evalita 2011. A version of the system is being used in a real-word application, which requires to corefer millions of names from multimedia sources.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.