We present an experimental setup for analysis and prediction on microarray data, specifically designed to identify and correct the impact of the selection bias in high-throughput problems. A number of recently published and overoptimistic studies present feature selection and gene profiling processes incurring in overfitting effects. We outline the selection bias problem and we demonstrate its effect on synthetic and microarray data. Then we introduce and describe a procedure to successfully deals with the problem through extensive resampling and label randomization techniques, employing Support Vector Machines as base classifier and an improved version of the Recursive Feature Elimination algorithm for gene ranking
Control of selection bias in microarray data analysis
Furlanello, Cesare;Serafini, Maria;Merler, Stefano;Jurman, Giuseppe
2003-01-01
Abstract
We present an experimental setup for analysis and prediction on microarray data, specifically designed to identify and correct the impact of the selection bias in high-throughput problems. A number of recently published and overoptimistic studies present feature selection and gene profiling processes incurring in overfitting effects. We outline the selection bias problem and we demonstrate its effect on synthetic and microarray data. Then we introduce and describe a procedure to successfully deals with the problem through extensive resampling and label randomization techniques, employing Support Vector Machines as base classifier and an improved version of the Recursive Feature Elimination algorithm for gene rankingI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.