We describe a ranking method for the identification of biomedically important genes for diagnosis and therapy, and provide an experimental schema for the development of predictive classification models on microarray data. This combined approach is designed to correct the selection bias problem (too optimistic prediction errors estimated by rules tested on samples previously used in the feature selection process). Our E--RFE ranking algorithm is a wrapper based on the entropy of the distribution of weights obtained from a SVM classifier. In order to control and speed-up the ranking process, E--RFE eliminates chunks of uninteresting genes until the remaining distribution stabilizes into higher entropy levels, then proceeding at shorter elimination steps as in the original recursive feature elimination RFE method. To control the selection bias, we use an external stratified partition resampling scheme and an internal K-fold cross-validation for the E--RFE feature ranking at each run. This double intensive model selection and error estimation process is made viable by a speed-up factor of 1/100 of E-RFE with respect to RFE, without a decrease of classification accuracy. The experimental scheme supports the identification of gene candidates with the highest contribute to predictive accuracy

Entropy-based gene ranking without selection bias for the predictive classification of microarray data

Furlanello, Cesare;Merler, Stefano;Jurman, Giuseppe;Serafini, Maria
2002-01-01

Abstract

We describe a ranking method for the identification of biomedically important genes for diagnosis and therapy, and provide an experimental schema for the development of predictive classification models on microarray data. This combined approach is designed to correct the selection bias problem (too optimistic prediction errors estimated by rules tested on samples previously used in the feature selection process). Our E--RFE ranking algorithm is a wrapper based on the entropy of the distribution of weights obtained from a SVM classifier. In order to control and speed-up the ranking process, E--RFE eliminates chunks of uninteresting genes until the remaining distribution stabilizes into higher entropy levels, then proceeding at shorter elimination steps as in the original recursive feature elimination RFE method. To control the selection bias, we use an external stratified partition resampling scheme and an internal K-fold cross-validation for the E--RFE feature ranking at each run. This double intensive model selection and error estimation process is made viable by a speed-up factor of 1/100 of E-RFE with respect to RFE, without a decrease of classification accuracy. The experimental scheme supports the identification of gene candidates with the highest contribute to predictive accuracy
2002
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/737
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact