Hierarchical classifications are concept hierarchies used to organize large amounts of documents. File systems, products' taxonomies for the market place and the directories provided by Web portals are common examples of hierarchical classifications. As semi-structured knowledge sources, hierarchical classifications have peculiar features: they differ both from plain texts since they are based on a taxonomy of concepts, and from structured data sources (such as databases and formal ontologies), because many semantic relations are implicit. We propose a methodology for building a semantic interpretation of hierarchical classifications on the basis of the analysis of the taxonomic relations and the linguistic material they contain. We provide a formal semantics for hierarchical classifications and then we use that formal framework to interpret the implicit knowledge represented, by exploring a number of crucial linguistic issues. Relevant phenomena addressed include the disambiguation of polysemous words, the semantics of multiwords, and the interpretation of coordinations. The Web Directories of Google and Yahoo! have been chosen as an evaluation set. We show that there is a considerable amount of information to be made explicit and discuss the performance of an implementation of our analysis
Making explicit the hidden semantics of hierarchical classifications
Magnini, Bernardo;Serafini, Luciano;Speranza, Manuela
2003-01-01
Abstract
Hierarchical classifications are concept hierarchies used to organize large amounts of documents. File systems, products' taxonomies for the market place and the directories provided by Web portals are common examples of hierarchical classifications. As semi-structured knowledge sources, hierarchical classifications have peculiar features: they differ both from plain texts since they are based on a taxonomy of concepts, and from structured data sources (such as databases and formal ontologies), because many semantic relations are implicit. We propose a methodology for building a semantic interpretation of hierarchical classifications on the basis of the analysis of the taxonomic relations and the linguistic material they contain. We provide a formal semantics for hierarchical classifications and then we use that formal framework to interpret the implicit knowledge represented, by exploring a number of crucial linguistic issues. Relevant phenomena addressed include the disambiguation of polysemous words, the semantics of multiwords, and the interpretation of coordinations. The Web Directories of Google and Yahoo! have been chosen as an evaluation set. We show that there is a considerable amount of information to be made explicit and discuss the performance of an implementation of our analysisI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.