Knowledge discovery is traditionally performed under a tacit closed-world assumption, in that, induction is performed on pre-acquired examples, and the possibility of acquiring additional information is ignored. The active learning paradigm addresses the problem of `intelligently` choosing the unlabelled examples for which the class labels are acquired without modifying the feature space. In contrast we propose a strategy that actively samples the values of new features on class-labeled examples to revise the feature space to perform feature selection among candidate features that have initially not been extracted on any of the examples. The objective is to interleave the acquisition of feature values and the assessment of feature relevance with the ultimate goal of selecting useful features at reduced sampling cost. We present an active feature sampling scheme that enables intelligent data acquisition by accurately predicting the relevance of the feature to the concept!with a reduced number of feature value queries. The optimal selection method, based on maximization of information gain, is approximated by an heuristic algorithm. We demonstrate that active sampling is cost effective in accurately estimating feature relevance. An empirical evaluation on benchmark UCI databases shows that on average our algorithm incurs lower cost compared to random sampling and to a previous active sampling scheme proposed in literature

Active feature sampling for low-cost feature evaluation

Veeramachaneni, Sriharsha;Avesani, Paolo;Olivetti, Emanuele
2004-01-01

Abstract

Knowledge discovery is traditionally performed under a tacit closed-world assumption, in that, induction is performed on pre-acquired examples, and the possibility of acquiring additional information is ignored. The active learning paradigm addresses the problem of `intelligently` choosing the unlabelled examples for which the class labels are acquired without modifying the feature space. In contrast we propose a strategy that actively samples the values of new features on class-labeled examples to revise the feature space to perform feature selection among candidate features that have initially not been extracted on any of the examples. The objective is to interleave the acquisition of feature values and the assessment of feature relevance with the ultimate goal of selecting useful features at reduced sampling cost. We present an active feature sampling scheme that enables intelligent data acquisition by accurately predicting the relevance of the feature to the concept!with a reduced number of feature value queries. The optimal selection method, based on maximization of information gain, is approximated by an heuristic algorithm. We demonstrate that active sampling is cost effective in accurately estimating feature relevance. An empirical evaluation on benchmark UCI databases shows that on average our algorithm incurs lower cost compared to random sampling and to a previous active sampling scheme proposed in literature
2004
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/2558
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact