Resolving the polymorphism-in-probe problem is critical for correct interpretation of expression QTL studies

Polymorphisms in the target mRNA sequence can greatly affect the binding affinity of microarray probe sequences, leading to false-positive and false-negative expression quantitative trait locus (QTL) signals with any other polymorphisms in linkage disequilibrium. We provide the most complete solution to this problem, by using the latest genome and exome sequence reference data to identify almost all common polymorphisms (frequency >1% in Europeans) in probe sequences for two commonly used microarray panels (the gene-based Illumina Human HT12 array, which uses 50-mer probes, and exon-based Affymetrix Human Exon 1.0 ST array, which uses 25-mer probes). We demonstrate the impact of this problem using cerebellum and frontal cortex tissues from 438 neuropathologically normal individuals. We find that although only a small proportion of the probes contain polymorphisms, they account for a large proportion of apparent expression QTL signals, and therefore result in many false signals being declared as real. We find that the polymorphism-in-probe problem is insufficiently controlled by previous protocols, and illustrate this using some notable false-positive and false-negative examples in MAPT and PRICKLE1 that can be found in many eQTL databases. We recommend that both new and existing eQTL data sets should be carefully checked in order to adequately address this issue.

[1]  Simon C. Potter,et al.  Mapping cis- and trans-regulatory effects across multiple tissues in twins , 2012, Nature Genetics.

[2]  Allissa Dillman,et al.  Edinburgh Research Explorer Integration of GWAS SNPs and tissue specific expression profiling reveal discrete eQTLs for human traits in blood and brain , 2022 .

[3]  Allissa Dillman,et al.  MAPT expression and splicing is differentially regulated by brain region: relation to genotype and implication for tauopathies , 2012, Human molecular genetics.

[4]  Margaret A. Pericak-Vance,et al.  Brain Expression Genome-Wide Association Study (eGWAS) Identifies Human Disease-Associated Variants , 2012, PLoS genetics.

[5]  J. Reifman,et al.  A new strategy to reduce allelic bias in RNA-Seq readmapping , 2012, Nucleic acids research.

[6]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[7]  Fred A. Wright,et al.  seeQTL: a searchable database for human eQTLs , 2011, Bioinform..

[8]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[9]  A. Ramasamy,et al.  Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies , 2011, Journal of neurochemistry.

[10]  M. Gerstein,et al.  AlleleSeq: analysis of allele-specific expression and binding in a network framework , 2011, Molecular systems biology.

[11]  Simon C. Potter,et al.  A Two-Stage Meta-Analysis Identifies Several New Loci for Parkinson's Disease , 2011, PLoS genetics.

[12]  Mohamad Saad,et al.  Imputation of sequence variants for identification of genetic risks for Parkinson's disease: a meta-analysis of genome-wide association studies , 2011, The Lancet.

[13]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[14]  J. Hardy,et al.  Disentangling the role of the tau gene locus in sporadic tauopathies. , 2010, Current Alzheimer research.

[15]  Tsun-Po Yang,et al.  Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies , 2010, Bioinform..

[16]  Silke Szymczak,et al.  Genetics and Beyond – The Transcriptome of Human Monocytes and Disease Susceptibility , 2010, PloS one.

[17]  Luigi Ferrucci,et al.  Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain , 2010, PLoS genetics.

[18]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[19]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[20]  M. Eileen Dolan,et al.  Comprehensive Survey of SNPs in the Affymetrix Exon Array Using the 1000 Genomes Dataset , 2010, PloS one.

[21]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[22]  Matthew E. Ritchie,et al.  A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data , 2009, Nucleic acids research.

[23]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[24]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[25]  Vivian G. Cheung,et al.  Genetics of human gene expression: mapping DNA variants that influence gene expression , 2009, Nature Reviews Genetics.

[26]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[27]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[28]  K. Shianna,et al.  Tissue-Specific Genetic Control of Splicing: Implications for the Study of Complex Traits , 2008, PLoS biology.

[29]  Albrecht Ott,et al.  Position dependent mismatch discrimination on DNA microarrays – experiments and model , 2008, BMC Bioinformatics.

[30]  M. Scott,et al.  A homozygous mutation in human PRICKLE1 causes an autosomal-recessive progressive myoclonus epilepsy-ataxia syndrome. , 2008, American journal of human genetics.

[31]  Scott A. Rifkin,et al.  Revealing the architecture of gene regulation: the promise of eQTL studies. , 2008, Trends in genetics : TIG.

[32]  D. Hoyle,et al.  Strong position-dependent effects of sequence mismatches on signal ratios measured using long oligonucleotide microarrays , 2008, BMC Genomics.

[33]  Jacek Majewski,et al.  Effect of polymorphisms within probe–target sequences on olignonucleotide microarray experiments , 2008, Nucleic acids research.

[34]  A. Ott,et al.  Impact of point-mutations on the hybridization affinity of surface-bound DNA/DNA and RNA/DNA oligonucleotide-duplexes: Comparison of single base mismatches and base bulges , 2008, BMC biotechnology.

[35]  D. Connor,et al.  The Sun Health Research Institute Brain Donation Program: Description and Eexperience, 1987–2007 , 2007, Cell and Tissue Banking.

[36]  Colin Smith,et al.  Tissue and organ donation for research in forensic pathology: the MRC Sudden Death Brain and Tissue Bank , 2007, The Journal of pathology.

[37]  D. Stephan,et al.  A survey of genetic human cortical gene expression , 2007, Nature Genetics.

[38]  R. Alberts,et al.  Sequence Polymorphisms Cause Many False cis eQTLs , 2007, PloS one.

[39]  W. Telford,et al.  SNPs matter: impact on detection of differential expression , 2007 .

[40]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .