Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery

The diagnostic and prognostic potential of the vast quantity of publicly-available microarray data has driven the development of methods for integrating the data from different microarray platforms. Cross-platform integration, when appropriately implemented, has been shown to improve reproducibility and robustness of gene signature biomarkers. Microarray platform integration can be conceptually divided into approaches that perform early stage integration (cross-platform normalization) versus late stage data integration (meta-analysis). A growing number of statistical methods and associated software for platform integration are available to the user, however an understanding of their comparative performance and potential pitfalls is critical for best implementation. In this review we provide evidence-based, practical guidance to researchers performing cross-platform integration, particularly with an objective to discover biomarkers.

[1]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[2]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[3]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[4]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[5]  Stuart C. Sealfon,et al.  Increasing Consistency of Disease Biomarker Prediction Across Datasets , 2014, PloS one.

[6]  R. Kitchen,et al.  Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis , 2012, BMC Medical Genomics.

[7]  Yi-Wei Tang,et al.  Basic Concepts of Microarrays and Potential Applications in Clinical Microbiology , 2009, Clinical Microbiology Reviews.

[8]  Jianjun Hu,et al.  Integrative disease classification based on cross-platform microarray data , 2009, BMC Bioinformatics.

[9]  Douglas G Altman,et al.  Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets , 2008, PLoS medicine.

[10]  Debashis Ghosh,et al.  Pathway analysis reveals functional convergence of gene expression profiles in breast cancer , 2008 .

[11]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Betsy Jane Becker,et al.  The Synthesis of Regression Slopes in Meta-Analysis. , 2007, 0801.4442.

[13]  Joseph Beyene,et al.  Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models , 2005, BMC Bioinformatics.

[14]  Jia Li,et al.  Biomarker detection in the integration of multiple multi-class genomic studies , 2010, Bioinform..

[15]  George C Tseng,et al.  HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS. , 2014, The annals of applied statistics.

[16]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[17]  Hugues Bersini,et al.  Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages , 2012, BMC Bioinformatics.

[18]  A. Brazma,et al.  Reuse of public genome-wide gene expression data , 2012, Nature Reviews Genetics.

[19]  K. Shadan,et al.  Available online: , 2012 .

[20]  Translating a gene expression signature for multiple myeloma prognosis into a robust high-throughput assay for clinical use , 2014, BMC Medical Genomics.

[21]  Jian Huang,et al.  Regularized gene selection in cancer microarray meta-analysis , 2009, BMC Bioinformatics.

[22]  Quaid Morris,et al.  PLIDA: cross-platform gene expression normalization using perturbed topic models , 2014, Bioinform..

[23]  M. Pepe,et al.  Improving biomarker identification with better designs and reporting. , 2011, Clinical chemistry.

[24]  Donald Geman,et al.  Merging microarray data from separate breast cancer studies provides a robust prognostic test , 2008, BMC Bioinformatics.

[25]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[26]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[27]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[28]  Atul J Butte,et al.  Robust meta-analysis of gene expression using the elastic net , 2015, Nucleic acids research.

[29]  Joel S. Parker,et al.  Adjustment of systematic microarray data biases , 2004, Bioinform..

[30]  Debashis Ghosh,et al.  Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data , 2004, BMC Genomics.

[31]  Crispin J. Miller,et al.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis , 2008, BMC Medical Genomics.

[32]  Stefano Volinia,et al.  MicroRNA expression profiling using microarrays , 2008, Nature Protocols.

[33]  John Quackenbush,et al.  Integrated Analysis of Multiple Microarray Datasets Identifies a Reproducible Survival Predictor in Ovarian Cancer , 2011, PloS one.

[34]  Jean Yee Hwa Yang,et al.  Comparison study of microarray meta-analysis methods , 2010, BMC Bioinformatics.

[35]  Purvesh Khatri,et al.  A comprehensive time-course–based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set , 2015, Science Translational Medicine.

[36]  Thomas Joos,et al.  Protein microarray technology , 2004, Expert review of proteomics.

[37]  Yan Lin,et al.  Detecting disease-associated genes with confounding variable adjustment and the impact on genomic meta-analysis: With application to major depressive disorder , 2012, BMC Bioinformatics.

[38]  C. Greenwood,et al.  Data Integration in Genetics and Genomics: Methods and Challenges , 2009, Human genomics and proteomics : HGP.

[39]  Yan Lin,et al.  An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection , 2012, Bioinform..

[40]  Crispin J. Miller,et al.  Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis , 2005, Bioinform..

[41]  Zhifu Sun,et al.  A Gene Expression Signature Predicts Survival of Patients with Stage I Non-Small Cell Lung Cancer , 2006, PLoS medicine.

[42]  Roland Eils,et al.  Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes , 2005, BMC Bioinformatics.

[43]  Ulrich Mansmann,et al.  Identification of a 24-gene prognostic signature that improves the European LeukemiaNet risk classification of acute myeloid leukemia: an international collaborative study. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[44]  I. Hoffman,et al.  Clinical Versus Rapid Molecular HIV Diagnosis in Hospitalized African Infants: A Randomized Controlled Trial Simulating Point-of-Care Infant Testing , 2014, Journal of acquired immune deficiency syndromes.

[45]  Maqc Consortium The MicroArray Quality Control ( MAQC )-II study of common practices for the development and validation of microarray-based predictive models , 2012 .

[46]  Jose A. Santiago,et al.  Network-based metaanalysis identifies HNF4A and PTBP1 as longitudinally dynamic biomarkers for Parkinson’s disease , 2015, Proceedings of the National Academy of Sciences.

[47]  Weida Tong,et al.  Maximizing biomarker discovery by minimizing gene signatures , 2011, BMC Genomics.

[48]  Matthew E. Ritchie,et al.  A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data , 2009, Nucleic acids research.

[49]  Andreas Heider,et al.  virtualArray: a R/bioconductor package to merge raw data from different microarray platforms , 2013, BMC Bioinformatics.

[50]  E. Diamandis,et al.  Cancer biomarkers: can we turn recent failures into success? , 2010, Journal of the National Cancer Institute.

[51]  Andrew B. Nobel,et al.  Merging two gene-expression studies via cross-platform normalization , 2008, Bioinform..

[52]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[53]  Jaewoo Kang,et al.  Robust Likelihood-Based Survival Modeling with Microarray Data , 2009 .

[54]  S. Baker Improving the biomarker pipeline to develop and evaluate cancer screening tests. , 2009, Journal of the National Cancer Institute.

[55]  K. Coffman,et al.  Secondary , 2020, Definitions.

[56]  Giovanni Parmigiani,et al.  A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer , 2004, Clinical Cancer Research.

[57]  Yufeng Liu,et al.  R/DWD: distance-weighted discrimination for classification, visualization and batch adjustment , 2012, Bioinform..

[58]  Ann Nowé,et al.  Comparison of Merging and Meta-Analysis as Alternative Approaches for Integrative Gene Expression Analysis , 2014, ISRN bioinformatics.

[59]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[60]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[61]  Xiaogang Wang,et al.  Multiple-platform data integration method with application to combined analysis of microarray and proteomic data , 2012, BMC Bioinformatics.

[62]  S. Horvath,et al.  Gene expression analysis of glioblastomas identifies the major molecular basis for the prognostic benefit of younger age , 2008, BMC Medical Genomics.

[63]  Jianguo Xia,et al.  NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data , 2015, Nature Protocols.

[64]  Avrum Spira,et al.  Translating the transcriptome into tools for the early detection and prevention of lung cancer , 2015, Thorax.

[65]  Faramarz Valafar,et al.  Empirical comparison of cross-platform normalization methods for gene expression data , 2011, BMC Bioinformatics.

[66]  Natalia Shulzhenko,et al.  Microarrays for cancer diagnosis and classification. , 2007, Advances in experimental medicine and biology.

[67]  Arthur S Slutsky,et al.  Microarray Meta-Analysis Identifies Acute Lung Injury Biomarkers in Donor Lungs That Predict Development of Primary Graft Failure in Recipients , 2012, PloS one.

[68]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[69]  J. Dixon,et al.  Gene expression profiling of response to mTOR inhibitor everolimus in pre-operatively treated post-menopausal women with oestrogen receptor-positive breast cancer , 2010, Breast Cancer Research and Treatment.

[70]  M. Fielden,et al.  Interlaboratory evaluation of genomic signatures for predicting carcinogenicity in the rat. , 2008, Toxicological sciences : an official journal of the Society of Toxicology.

[71]  George C. Tseng,et al.  Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline , 2013, BMC Bioinformatics.

[72]  Joseph Beyene,et al.  Tests for differential gene expression using weights in oligonucleotide microarray experiments , 2006, BMC Genomics.

[73]  Yi Zhang,et al.  Advances in microfluidic PCR for point-of-care infectious disease diagnostics. , 2011, Biotechnology advances.

[74]  Naftali Kaminski,et al.  MetaQC: objective quality control and inclusion/exclusion criteria for genomic meta-analysis , 2011, Nucleic acids research.

[75]  Richard Simon,et al.  Genomic biomarkers in predictive medicine. An interim analysis , 2011, EMBO molecular medicine.

[76]  Rainer Breitling,et al.  A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments , 2008, Bioinform..

[77]  R. Kitchen,et al.  Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments , 2011, BMC Genomics.

[78]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.