A local distance measure for the nearest neighbor classification rule is shown to achieve high compression rates on real data sets. In the approach proposed here, first a set of prototypes is extracted from the training and then a learning algorithm, similar to the delta rule, is used to optimize the metric. Even if the prototypes are selected without running any particular optimization algorithm, the proposed metric outperforms, both in compression rate and accuracy, common editing procedures like ICA, RNN and PNN. Theoretical arguments are presented as rationale of the experimental results. The local metric here proposed can learn exactly a general class of concepts using less examples than those required by the Euclidean metric. Finally, when accuracy is the major concern, it is shown how compression can be traded for accuracy by exploiting voting techniques
Data Compression and Local Metrics for Nearest Neighbor Classification
Ricci, Francesco;Avesani, Paolo
1999-01-01
Abstract
A local distance measure for the nearest neighbor classification rule is shown to achieve high compression rates on real data sets. In the approach proposed here, first a set of prototypes is extracted from the training and then a learning algorithm, similar to the delta rule, is used to optimize the metric. Even if the prototypes are selected without running any particular optimization algorithm, the proposed metric outperforms, both in compression rate and accuracy, common editing procedures like ICA, RNN and PNN. Theoretical arguments are presented as rationale of the experimental results. The local metric here proposed can learn exactly a general class of concepts using less examples than those required by the Euclidean metric. Finally, when accuracy is the major concern, it is shown how compression can be traded for accuracy by exploiting voting techniquesI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.