Evaluation of cross-platform and interlaboratory concordance via consensus modelling of genomic measurements

Motivation: A synoptic view of the human genome benefits chiefly from the application of nucleic acid sequencing and microarray technologies. These platforms allow interrogation of patterns such as gene expression and DNA methylation at the vast majority of canonical loci, allowing granular insights and opportunities for validation of original findings. However, problems arise when validating against a “gold standard” measurement, since this immediately biases all subsequent measurements towards that particular technology or protocol. Since all genomic measurements are estimates, in the absence of a ”gold standard” we instead empirically assess the measurement precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row‐linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. Both cross‐platform and cross‐locus comparisons can be made across all common loci, allowing identification of technology‐ and locus‐specific tendencies. Results: We assess technologies including the Infinium MethylationEPIC BeadChip, whole genome bisulfite sequencing (WGBS), two different RNA‐Seq protocols (PolyA+ and Ribo‐Zero) and five different gene expression array platforms. Each technology thus is characterised herein, relative to the consensus. We showcase a number of applications of the row‐linear model, including correlation with known interfering traits. We demonstrate a clear effect of cross‐hybridisation on the sensitivity of Infinium methylation arrays. Additionally, we perform a true interlaboratory test on a set of samples interrogated on the same platform across twenty‐one separate testing laboratories. Availability and implementation: A full implementation of the row‐linear model, plus extra functions for visualisation, are found in the R package consensus at https://github.com/timpeters82/consensus. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[2]  B. Saravanos,et al.  Statistical Analysis of Experimental Data , 2008 .

[3]  J. Alexander,et al.  An improved, rapid Northern protocol. , 1997, Biochemical and biophysical research communications.

[4]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[5]  Brian A. Nosek,et al.  Making sense of replications , 2017, eLife.

[6]  Martin J. Aryee,et al.  Coverage recommendations for methylation analysis by whole genome bisulfite sequencing , 2014, Nature Methods.

[7]  Yves Van de Peer,et al.  In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation , 2007, BMC Bioinformatics.

[8]  Jeffrey T Leek,et al.  Reproducible RNA-seq analysis using recount2 , 2017, Nature Biotechnology.

[9]  G. Dittmar,et al.  RNA sequencing and transcriptome arrays analyses show opposing results for alternative splicing in patient derived samples , 2017, BMC Genomics.

[10]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[11]  Stéphane Robin,et al.  Amplification biases: possible differences among deviating gene expressions , 2008, BMC Genomics.

[12]  Marie-Liesse Asselin-Labat,et al.  RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods , 2016, bioRxiv.

[13]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[14]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[15]  J. I The Design of Experiments , 1936, Nature.

[16]  Lei Liu,et al.  A study of inter-lab and inter-platform agreement of DNA microarray data , 2005, BMC Genomics.

[17]  Thomas Mathew,et al.  Models and Confidence Intervals for True Values in Interlaboratory Trials , 2004 .

[18]  Alison S. Devonshire,et al.  International Interlaboratory Digital PCR Study Demonstrating High Reproducibility for the Measurement of a Rare Sequence Variant. , 2017, Analytical chemistry.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Mingyao Li,et al.  Evaluating the Impact of Sequencing Depth on Transcriptome Profiling in Human Adipose , 2013, PloS one.

[21]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[22]  K. Popper,et al.  The Logic of Scientific Discovery , 1960 .

[23]  Antti Honkela,et al.  Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability , 2013, PloS one.

[24]  J Mandel Analyzing Interlaboratory Data According to ASTM Standard E691 , 1994 .

[25]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[26]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[27]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[28]  H. Hogrefe,et al.  Amplification efficiency of thermostable DNA polymerases. , 2003, Analytical biochemistry.

[29]  David P. Kreil,et al.  The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance , 2014, Nature Biotechnology.

[30]  Sheng Li,et al.  Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study , 2014, Nature Biotechnology.

[31]  M. Salit,et al.  Synthetic Spike-in Standards for Rna-seq Experiments Material Supplemental Open Access License Commons Creative , 2022 .

[32]  Jie Tan,et al.  Cross-platform normalization of microarray and RNA-seq data for machine learning applications , 2016, PeerJ.

[33]  Haroon Naeem,et al.  Reducing the risk of false discovery enabling identification of biologically significant genome-wide methylation status using the HumanMethylation450 array , 2014, BMC Genomics.

[34]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[35]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[36]  Timothy J. Peters,et al.  Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling , 2016, Genome Biology.

[37]  M J Kowalewski Quality and statistics: total quality management , 1994 .

[38]  R. Weksberg,et al.  Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray , 2013, Epigenetics.

[39]  S. Clark,et al.  Detection and measurement of PCR bias in quantitative methylation analysis of bisulphite-treated DNA. , 1997, Nucleic acids research.

[40]  Timothy J. Peters,et al.  Enduring epigenetic landmarks define the cancer microenvironment , 2018, Genome research.

[41]  Kathleen F. Kerr,et al.  The External RNA Controls Consortium: a progress report , 2005, Nature Methods.

[42]  Yalchin Oytam,et al.  Risk-conscious correction of batch effects: maximising information extraction from high-throughput genomic datasets , 2016, BMC Bioinformatics.

[43]  D. Tranchina,et al.  Stochastic mRNA Synthesis in Mammalian Cells , 2006, PLoS biology.

[44]  K. Hansen,et al.  Functional normalization of 450k methylation array data improves replication in large cancer studies , 2014, Genome Biology.

[45]  C. Glenn Begley,et al.  Raise standards for preclinical cancer research , 2012 .

[46]  K. Gunderson,et al.  High density DNA methylation array with single CpG site resolution. , 2011, Genomics.