Control of selection bias in microarray data analysis

Furlanello, Cesare; Serafini, Maria; Merler, Stefano; Jurman, Giuseppe

We present an experimental setup for analysis and prediction on microarray data, specifically designed to identify and correct the impact of the selection bias in high-throughput problems. A number of recently published and overoptimistic studies present feature selection and gene profiling processes incurring in overfitting effects. We outline the selection bias problem and we demonstrate its effect on synthetic and microarray data. Then we introduce and describe a procedure to successfully deals with the problem through extensive resampling and label randomization techniques, employing Support Vector Machines as base classifier and an improved version of the Recursive Feature Elimination algorithm for gene ranking