The general approach for automatically driving data collection using information from previously acquired data is called active learning. The traditional active learning paradigm addresses the problem of choosing the unlabeled examples for which the class labels are queried with the goal of learning a classifier. In contrast we address the problem of active feature sampling for cost-constrained feature selection. We propose a strategy that actively samples the values of new features on class-labeled examples, with the objective of interleaving the acquisition of feature values and the assessment of feature relevance. We derive a novel active feature sampling algorithm from an information theoretic and statistical formulation of the problem. We present experimental results on synthetic, UCI and real world datasets to demonstrate that our active sampling algorithm can provide accurate estimates of feature relevance with significantly lower data acquisition costs than random sampling and other previously proposed sampling algorithms

Active Feature Sampling for Low Cost Feature Evaluation

Veeramachaneni, Sriharsha;Olivetti, Emanuele;Avesani, Paolo
2005

Abstract

The general approach for automatically driving data collection using information from previously acquired data is called active learning. The traditional active learning paradigm addresses the problem of choosing the unlabeled examples for which the class labels are queried with the goal of learning a classifier. In contrast we address the problem of active feature sampling for cost-constrained feature selection. We propose a strategy that actively samples the values of new features on class-labeled examples, with the objective of interleaving the acquisition of feature values and the assessment of feature relevance. We derive a novel active feature sampling algorithm from an information theoretic and statistical formulation of the problem. We present experimental results on synthetic, UCI and real world datasets to demonstrate that our active sampling algorithm can provide accurate estimates of feature relevance with significantly lower data acquisition costs than random sampling and other previously proposed sampling algorithms
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11582/2629
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact