Managing the hierarchical organization of data is starting to play a key role in the knowledge management community due to the great amount of human resources needed to create and maintain these organized repositories of information. Machine learning community has in part addressed this problem by developing hierarchical supervised classifiers that help maintainers to categorize new resources within given hierarchies. Although such learning models succeed in exploiting relational knowledge, they are highly demanding in terms of labeled examples, because the number of categories is related to the dimension of the corresponding hierarchy. Hence, the creation of new directories or the modification of existing ones require strong investments. This paper proposes a semi-automatic process (interleaved with human suggestions) whose aim is to minimize (simplify) the work required to the administrators when creating, modifying, and maintaining directories. Within this process, bootstrapping a taxonomy with examples represents a critical factor for the effective exploitation of any supervised learning model. For this reason we propose a method for the bootstrapping process that makes a first hypothesis of categorization for a set of unlabeled documents, with respect to a given empty hierarchy of concepts. Based on a revision of Self-Organizing Maps, namely TaxSOM, the proposed model performs an unsupervised classification, exploiting the a- priori knowledge encoded in a taxonomy structure both at the terminological and topological level. The ultimate goal of TaxSOM is to create the premise for successfully training a supervised classifier

Bootstrapping for Hierarchical Document Classification

Adami, Giordano;Avesani, Paolo;Sona, Diego
2003-01-01

Abstract

Managing the hierarchical organization of data is starting to play a key role in the knowledge management community due to the great amount of human resources needed to create and maintain these organized repositories of information. Machine learning community has in part addressed this problem by developing hierarchical supervised classifiers that help maintainers to categorize new resources within given hierarchies. Although such learning models succeed in exploiting relational knowledge, they are highly demanding in terms of labeled examples, because the number of categories is related to the dimension of the corresponding hierarchy. Hence, the creation of new directories or the modification of existing ones require strong investments. This paper proposes a semi-automatic process (interleaved with human suggestions) whose aim is to minimize (simplify) the work required to the administrators when creating, modifying, and maintaining directories. Within this process, bootstrapping a taxonomy with examples represents a critical factor for the effective exploitation of any supervised learning model. For this reason we propose a method for the bootstrapping process that makes a first hypothesis of categorization for a set of unlabeled documents, with respect to a given empty hierarchy of concepts. Based on a revision of Self-Organizing Maps, namely TaxSOM, the proposed model performs an unsupervised classification, exploiting the a- priori knowledge encoded in a taxonomy structure both at the terminological and topological level. The ultimate goal of TaxSOM is to create the premise for successfully training a supervised classifier
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/2010
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact