The AdaBoost algorithm is one of the most successful classification methods in use. While the algorithm largely preserves its general and practical applicability, theoretical and experimental work shows that AdaBoost can overfit when it is applied to noisy data. In this paper, a procedure is proposed for bias-variance control when the AdaBoost algorithm is employed in classification tasks. The method is based on an earlier notion of easy and hard training patterns as emerging from analysis of the dynamical evolutions of AdaBoost weights. More specifically, the procedure consists in sorting data points by hardness, and in progressively eliminating the hardest among them from the data set. Effectiveness of the method is tested and discussed on synthetic as well as natural data
Bias-Variance Control via Hard Points Shaving
Merler, Stefano;Caprile, Bruno Giovanni;Furlanello, Cesare
2003-01-01
Abstract
The AdaBoost algorithm is one of the most successful classification methods in use. While the algorithm largely preserves its general and practical applicability, theoretical and experimental work shows that AdaBoost can overfit when it is applied to noisy data. In this paper, a procedure is proposed for bias-variance control when the AdaBoost algorithm is employed in classification tasks. The method is based on an earlier notion of easy and hard training patterns as emerging from analysis of the dynamical evolutions of AdaBoost weights. More specifically, the procedure consists in sorting data points by hardness, and in progressively eliminating the hardest among them from the data set. Effectiveness of the method is tested and discussed on synthetic as well as natural dataI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.