Active Feature Sampling for Low Cost Feature Evaluation

Veeramachaneni, Sriharsha; Olivetti, Emanuele; Avesani, Paolo

The general approach for automatically driving data collection using information from previously acquired data is called active learning. The traditional active learning paradigm addresses the problem of choosing the unlabeled examples for which the class labels are queried with the goal of learning a classifier. In contrast we address the problem of active feature sampling for cost-constrained feature selection. We propose a strategy that actively samples the values of new features on class-labeled examples, with the objective of interleaving the acquisition of feature values and the assessment of feature relevance. We derive a novel active feature sampling algorithm from an information theoretic and statistical formulation of the problem. We present experimental results on synthetic, UCI and real world datasets to demonstrate that our active sampling algorithm can provide accurate estimates of feature relevance with significantly lower data acquisition costs than random sampling and other previously proposed sampling algorithms