Recursive Feature Elimination with Random Forest for PTR-MS analysis of agroindustrial products.

Granitto, P.; Furlanello, Cesare; Biasioli, F.; Gasperi, F.

In this paper we apply the recently introduced Random Forest-Recursive Feature Elimination (RF-RFE) algorithm to the identification of relevant features in the spectra produced by Proton Transfer Reaction-Mass Spectrometry (PTR-MS) analysis of agroindustrial products. The method is compared with the more traditional Support Vector Machine-Recursive Feature Elimination (SVM-RFE), extended to allow multiclass problems, and with a baseline method based on the Kruskalâ-Wallis statistic (KWS). In particular, we apply all selection methods to the discrimination of nine varieties of strawberries and six varieties of typical cheeses from Trentino Province, North Italy. Using replicated experiments we estimate unbiased generalization errors. Our results show that RF-RFE outperforms SVM-RFE and KWS on the task of finding small subsets of features with high discrimination levels on PTR-MS data sets. We also show how selection probabilities and features co- occurrence can be used to highlight the most relevant features for discrimination.