When knowledge discovery is viewed as an iterative process wherein the data collection and analysis parts are repeated in sequence, models learnt from the current data can be used to provide support for future data collection. The general approach for driving data collection using information from already acquired data is called active learning. The traditional active learning paradigm addresses the problem of choosing the unlabeled examples for which the class labels are queried without modifying the feature space. In contrast we propose a strategy that actively samples the values of new features on class-labeled examples, with the objective of interleaving the acquisition of feature values and the assessment of feature relevance. We justify our algorithm on information theoretic and statistical grounds. Using an illustrative example we show that our active feature sampling scheme can enable the selection of relevant features with significantly lower data acquisition costs than random sampling
Active Feature Sampling for Cost Constrained Knowledge Discovery
Veeramachaneni, Sriharsha;Avesani, Paolo;Olivetti, Emanuele
2004-01-01
Abstract
When knowledge discovery is viewed as an iterative process wherein the data collection and analysis parts are repeated in sequence, models learnt from the current data can be used to provide support for future data collection. The general approach for driving data collection using information from already acquired data is called active learning. The traditional active learning paradigm addresses the problem of choosing the unlabeled examples for which the class labels are queried without modifying the feature space. In contrast we propose a strategy that actively samples the values of new features on class-labeled examples, with the objective of interleaving the acquisition of feature values and the assessment of feature relevance. We justify our algorithm on information theoretic and statistical grounds. Using an illustrative example we show that our active feature sampling scheme can enable the selection of relevant features with significantly lower data acquisition costs than random samplingI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.