The robustness of genome-wide association study (GWAS) results depends on the genotyping algorithms used to establish the association. This paper initiated the assessment of the impact of the Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) genotyping quality on identifying real significant genes in a GWAS with large sample sizes. With microarray image data from the Wellcome Trust Case-Control Consortium (WTCCC), 1991 individuals with coronary artery disease (CAD) and 1500 controls, genetic associations were evaluated under various batch sizes and compositions. Experimental designs included different batch sizes of 250, 350, 500, 2000 samples with different distributions of cases and controls in each batch with either randomized or simply combined (4:3 case-control ratios) or separate case-control samples as well as whole 3491 samples. The separate composition could create 2-3% discordance in the single nucleotide polymorphism (SNP) results for quality control/statistical analysis and might contribute to the lack of reproducibility between GWAS. CRLMM shows high genotyping accuracy and stability to batch effects. According to the genotypic and allelic tests (P<5.0*10^-7), nine significant signals on chromosome 9 were found consistently in all batch sizes with combined design. Our findings are critical to optimize the reproducibility of GWAS and confirm the genetic role in the pathophysiology of CAD.
Assessment of variability in GWAS with CRLMM genotyping algorithm on WTCCC Coronary Artery Disease.
Chierici, Marco;Furlanello, Cesare;
2010-01-01
Abstract
The robustness of genome-wide association study (GWAS) results depends on the genotyping algorithms used to establish the association. This paper initiated the assessment of the impact of the Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) genotyping quality on identifying real significant genes in a GWAS with large sample sizes. With microarray image data from the Wellcome Trust Case-Control Consortium (WTCCC), 1991 individuals with coronary artery disease (CAD) and 1500 controls, genetic associations were evaluated under various batch sizes and compositions. Experimental designs included different batch sizes of 250, 350, 500, 2000 samples with different distributions of cases and controls in each batch with either randomized or simply combined (4:3 case-control ratios) or separate case-control samples as well as whole 3491 samples. The separate composition could create 2-3% discordance in the single nucleotide polymorphism (SNP) results for quality control/statistical analysis and might contribute to the lack of reproducibility between GWAS. CRLMM shows high genotyping accuracy and stability to batch effects. According to the genotypic and allelic tests (P<5.0*10^-7), nine significant signals on chromosome 9 were found consistently in all batch sizes with combined design. Our findings are critical to optimize the reproducibility of GWAS and confirm the genetic role in the pathophysiology of CAD.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.