Nowadays, surfing the Internet and looking for persons seems to be one of the most common activities of Internet users. However person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a system able to identify person names in different documents which refer to the same person entity is presented. Differently from other systems which adopt a fixed similarity threshold to group documents talking about the same person, the presented approach uses a threshold capable of changing its value on the basis of the ambiguity of the name as estimated by using external resources (i.e. phonebooks). For each name the algorithm was provided with a specific threshold value and with a rich set of features (e.g. Named Entities) extracted from the document where the person name is mentioned; the performance of the system was tested taking part in the News People Search (NePS) task at Evalita 2011.
Dynamic Threshold for Clustering Person Names
Zanoli, Roberto;Corcoglioniti, Francesco;Girardi, Christian
2012-01-01
Abstract
Nowadays, surfing the Internet and looking for persons seems to be one of the most common activities of Internet users. However person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a system able to identify person names in different documents which refer to the same person entity is presented. Differently from other systems which adopt a fixed similarity threshold to group documents talking about the same person, the presented approach uses a threshold capable of changing its value on the basis of the ambiguity of the name as estimated by using external resources (i.e. phonebooks). For each name the algorithm was provided with a specific threshold value and with a rich set of features (e.g. Named Entities) extracted from the document where the person name is mentioned; the performance of the system was tested taking part in the News People Search (NePS) task at Evalita 2011.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.