This paper presents two metrics for the Nearest Neighbor Classifier that share the property of being adapted, i.e. learned, on a set of data. Both metrics can be used for similarity search when the retrieval critically depends on a symbolic target feature. The first one is called Local Asymmetrically Weighted Similarity Metric (LASM) and exploits reinforcement learning techniques for the computation of asymmetric weights. The learning procedure of LASM initially extracts a set of prototypes from the training data and iteratively optimizes its parameters using the remaining data. Experiments on benchmark datasets show that LASM maintains good accuracy and achieves high compression rates outperforming competitor editing techniques like Condensed Nearest Neighbor. On a completely different perspective the second metric, called Minimum Risk Metric (MRM) is based on probability estimates. MRM is optimal in the sense that it optimizes the finite misclassification risk and experimentally proved to outperform other probability based metrics like the Short and Fukunaga metric and the Difference Value Metrics. MRM can be implemented using different probability estimates and performs comparably to the Bayes classifier based on the same estimates. Both LASM and MRM outperform the NN classifier with the Euclidean metric
Advanced Metrics for Class-Driven Similarity Search
Avesani, Paolo;Blanzieri, Enrico;Ricci, Francesco
1999-01-01
Abstract
This paper presents two metrics for the Nearest Neighbor Classifier that share the property of being adapted, i.e. learned, on a set of data. Both metrics can be used for similarity search when the retrieval critically depends on a symbolic target feature. The first one is called Local Asymmetrically Weighted Similarity Metric (LASM) and exploits reinforcement learning techniques for the computation of asymmetric weights. The learning procedure of LASM initially extracts a set of prototypes from the training data and iteratively optimizes its parameters using the remaining data. Experiments on benchmark datasets show that LASM maintains good accuracy and achieves high compression rates outperforming competitor editing techniques like Condensed Nearest Neighbor. On a completely different perspective the second metric, called Minimum Risk Metric (MRM) is based on probability estimates. MRM is optimal in the sense that it optimizes the finite misclassification risk and experimentally proved to outperform other probability based metrics like the Short and Fukunaga metric and the Difference Value Metrics. MRM can be implemented using different probability estimates and performs comparably to the Bayes classifier based on the same estimates. Both LASM and MRM outperform the NN classifier with the Euclidean metricI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.