In many knowledge discovery applications the data mining step is followed by further data acquisition. New data may consist of new instances and/or new features for the old instances. When new features are to be added an acquisition policy can help decide what features have to be acquired based on their predictive capability and the cost of acquisition. This can be posed as a feature selection problem where the feature values are not known in advance. We propose a technique to actively sample the feature values with the ultimate goal of choosing between alternative candidate features with minimum sampling cost. Our algorithm is based on extracting candidate features in a "region" of the instance space where the feature value is likely to alter our knowledge the most. An experimental evaluation on a standard database shows that it is possible outperform a random subsampling policy in terms of the accuracy in feature evaluation
Active Sampling for Feature Selection
Veeramachaneni, Sriharsha;Avesani, Paolo
2003-01-01
Abstract
In many knowledge discovery applications the data mining step is followed by further data acquisition. New data may consist of new instances and/or new features for the old instances. When new features are to be added an acquisition policy can help decide what features have to be acquired based on their predictive capability and the cost of acquisition. This can be posed as a feature selection problem where the feature values are not known in advance. We propose a technique to actively sample the feature values with the ultimate goal of choosing between alternative candidate features with minimum sampling cost. Our algorithm is based on extracting candidate features in a "region" of the instance space where the feature value is likely to alter our knowledge the most. An experimental evaluation on a standard database shows that it is possible outperform a random subsampling policy in terms of the accuracy in feature evaluationI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.