When knowledge discovery is viewed as an iterative process wherein the data collection and analysis parts are repeated in sequence, models learnt from the current data can be used to provide support for future data collection. The general approach for driving data collection using information from already acquired data is called active learning. The traditional active learning paradigm addresses the problem of choosing the unlabeled examples for which the class labels are queried without modifying the feature space. In contrast we propose a strategy that actively samples the values of new features on class-labeled examples, with the objective of interleaving the acquisition of feature values and the assessment of feature relevance. We justify our algorithm on information theoretic and statistical grounds. Using an illustrative example we show that our active feature sampling scheme can enable the selection of relevant features with significantly lower data acquisition costs than random sampling

Active Feature Sampling for Cost Constrained Knowledge Discovery

Veeramachaneni, Sriharsha;Avesani, Paolo;Olivetti, Emanuele
2004

Abstract

When knowledge discovery is viewed as an iterative process wherein the data collection and analysis parts are repeated in sequence, models learnt from the current data can be used to provide support for future data collection. The general approach for driving data collection using information from already acquired data is called active learning. The traditional active learning paradigm addresses the problem of choosing the unlabeled examples for which the class labels are queried without modifying the feature space. In contrast we propose a strategy that actively samples the values of new features on class-labeled examples, with the objective of interleaving the acquisition of feature values and the assessment of feature relevance. We justify our algorithm on information theoretic and statistical grounds. Using an illustrative example we show that our active feature sampling scheme can enable the selection of relevant features with significantly lower data acquisition costs than random sampling
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11582/2588
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact