Modern data mining tools in descriptive sensory analysis: a case study with a Random Forest approach.

Granitto, P.; Gasperi, F.; Biasioli, F.; Trainotti, E.; Furlanello, Cesare

In this paper we introduce random forest (RF) as a new modeling technique in the field of sensory analysis. As a case study we apply RF to the predictive discrimination of six typical cheeses of the Trentino province (North Italy) from data obtained by quantitative descriptive analysis. The corresponding sensory profiling was carried out by eight trained assessors using a developed language containing 35 attributes. We compare RFs discrimination capabilities with linear discriminant analysis (LDA) and discriminant partial least square (dPLS). The RF models result more accurate, with smaller prediction errors than LDA and dPLS. RF also offers the possibility of graphically analyzing the developed models with multi-dimensional scaling plots based on an internal measure of similarity between samples. We compare these plots with similar ones derived from principal component analysis and LDA, finding that the same qualitative information can be extracted from all methods. The RF model also gives an estimation of the relative importance of each sensory attribute for the discriminant function. We couple this measure with an appropriate experimental setup in order to obtain an unbiased and stable method for variable selection. We favorably compare this method with sequential selection based on LDA models. 2006 Elsevier Ltd. All rights reserved.