Hiearchical categorization of documents is a task receiving growing interest both in the information retrieval and machine learning communities. Although hierarchical supervised classifiers seem to exploit the relational knowledge, at the same time they require an increasing amount of labelled examples. The bootstrap of hierarchical supervised classifiers is becoming a critical issue because the total amount of labelled examples is related to the size of concept hierarchy. This work proposes a solution to the bootstrap problem based on the self-organizing maps. This well known model is revised to enable the exploitation of the a-priori knowledge encoded in a taxonomy structure both at the terminological and topological level. An experimental evaluation has been performed on a collection of taxonomies extracted from the Google web directory
Unsupervised Categorization Exploiting a-priori Knowledge of a Taxonomy
Adami, Giordano;Avesani, Paolo;Sona, Diego
2003-01-01
Abstract
Hiearchical categorization of documents is a task receiving growing interest both in the information retrieval and machine learning communities. Although hierarchical supervised classifiers seem to exploit the relational knowledge, at the same time they require an increasing amount of labelled examples. The bootstrap of hierarchical supervised classifiers is becoming a critical issue because the total amount of labelled examples is related to the size of concept hierarchy. This work proposes a solution to the bootstrap problem based on the self-organizing maps. This well known model is revised to enable the exploitation of the a-priori knowledge encoded in a taxonomy structure both at the terminological and topological level. An experimental evaluation has been performed on a collection of taxonomies extracted from the Google web directoryI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.