The Affymetrix GeneChip Human Mapping 500K array is common for genome-wide association studies (GWASs). Recent findings highlight the importance of accurate genotype calling algorithms to reduce the inflation in Type I and Type II error rates. Differential results due to genotype calling errors can introduce severe bias in case-control association study results. Using data from the Wellcome Trust Case Control Consortium, 1991 individuals with coronary artery disease (CAD) and 1500 controls from the UK Blood Services (NBS) were genotyped on the Affymetrix 500K array. Different batch sizes and compositions were used in the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) genotype calling algorithm to assess the batch effect on downstream association analysis. Results show that composition (cases and controls genotyped simultaneously or separate) and size (number of individuals processed by BRLMM at a time) can create 2-3% discordance in the results for quality control and statistical analysis and may contribute to the lack of reproducibility between GWASs. The changes in batch size are largely responsible for differential single- nucleotide polymorphism results, yet we observe evidence of an interactive effect of batch size and composition that contributes to discordant results in the list of significantly associated loci.

Batch Effects in the BRLMM Genotype Calling Algorithm Influence GWAS Results for the Affymetrix 500K Array.

Chierici, Marco;Furlanello, Cesare;
2010

Abstract

The Affymetrix GeneChip Human Mapping 500K array is common for genome-wide association studies (GWASs). Recent findings highlight the importance of accurate genotype calling algorithms to reduce the inflation in Type I and Type II error rates. Differential results due to genotype calling errors can introduce severe bias in case-control association study results. Using data from the Wellcome Trust Case Control Consortium, 1991 individuals with coronary artery disease (CAD) and 1500 controls from the UK Blood Services (NBS) were genotyped on the Affymetrix 500K array. Different batch sizes and compositions were used in the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) genotype calling algorithm to assess the batch effect on downstream association analysis. Results show that composition (cases and controls genotyped simultaneously or separate) and size (number of individuals processed by BRLMM at a time) can create 2-3% discordance in the results for quality control and statistical analysis and may contribute to the lack of reproducibility between GWASs. The changes in batch size are largely responsible for differential single- nucleotide polymorphism results, yet we observe evidence of an interactive effect of batch size and composition that contributes to discordant results in the list of significantly associated loci.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/10668
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact