Active feature sampling for low-cost feature evaluation

Veeramachaneni, Sriharsha; Avesani, Paolo; Olivetti, Emanuele

Knowledge discovery is traditionally performed under a tacit closed-world assumption, in that, induction is performed on pre-acquired examples, and the possibility of acquiring additional information is ignored. The active learning paradigm addresses the problem of `intelligently` choosing the unlabelled examples for which the class labels are acquired without modifying the feature space. In contrast we propose a strategy that actively samples the values of new features on class-labeled examples to revise the feature space to perform feature selection among candidate features that have initially not been extracted on any of the examples. The objective is to interleave the acquisition of feature values and the assessment of feature relevance with the ultimate goal of selecting useful features at reduced sampling cost. We present an active feature sampling scheme that enables intelligent data acquisition by accurately predicting the relevance of the feature to the concept!with a reduced number of feature value queries. The optimal selection method, based on maximization of information gain, is approximated by an heuristic algorithm. We demonstrate that active sampling is cost effective in accurately estimating feature relevance. An empirical evaluation on benchmark UCI databases shows that on average our algorithm incurs lower cost compared to random sampling and to a previous active sampling scheme proposed in literature