TOMAS: A novel TOpology-aware Meta-Analysis approach applied to System biology

With the explosion of high-throughput data, an effective integrative analysis is needed to decipher the knowledge accumulated in multiple studies. However, batch effects, patient heterogeneity, and disease complexity all complicate the integration of data from different sources. Here we introduce TOMAS, a novel meta-analysis framework that transforms the challenging meta-analysis problem into a set of standard analysis problems that can be solved efficiently. This framework utilizes techniques based on both p-values and effect sizes to identify differentially expressed genes and their expression change on a genome-scale. The computed statistics allow for topology-aware pathway analysis of the given phenotypes, where topological information of genes is taken into consideration. We compare TOMAS with four meta-analysis approaches, as well as with three dedicated pathway analysis approaches that employ multiple datasets (MetaPath). The eight approaches have been tested on 609 samples from 9 Alzheimer's studies conducted in independent labs for different sets of patients and tissues. We demonstrate that the topology based meta-analysis framework overcomes noise and bias to identify pathways that are known to be implicated in Alzheimer's disease. While presented here in a genomic data analysis application, the proposed framework is sufficiently general to be applied in other research areas.

[1]  A. Gardner Methods of Statistics , 1941 .

[2]  Timothy L. Tickle,et al.  Towards the uniform distribution of null P values on Affymetrix microarrays , 2007, Genome biology.

[3]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[4]  P. Brookes,et al.  Calcium, ATP, and ROS: a mitochondrial love-hate triangle. , 2004, American journal of physiology. Cell physiology.

[5]  Tin Chi Nguyen,et al.  Overcoming the matched-sample bottleneck: an orthogonal approach to integrate omic data , 2016, Scientific Reports.

[6]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[7]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[8]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[9]  George Perry,et al.  Abnormal mitochondrial dynamics in the pathogenesis of Alzheimer's disease. , 2012, Journal of Alzheimer's disease : JAD.

[10]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[11]  Risto Lehtonen,et al.  Multilevel Statistical Models , 2005 .

[12]  Keith M Godfrey,et al.  Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions , 2012, BMC Genomics.

[13]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[14]  George C. Tseng,et al.  Meta-analysis for pathway enrichment analysis when combining multiple genomic studies , 2010, Bioinform..

[15]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[16]  Jacob Cohen,et al.  THINGS I HAVE LEARNED (SO FAR) , 1990 .

[17]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[18]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[19]  Roland Eils,et al.  Group testing for pathway analysis improves comparability of different microarray datasets , 2006, Bioinform..

[20]  Dallas Johnson,et al.  Analysis of Messy Data Volume 1 , 2009 .

[21]  S. R. Searle,et al.  Restricted Maximum Likelihood (REML) Estimation of Variance Components in the Mixed Model , 1976 .

[22]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[23]  Ben Bolstad,et al.  Low-level Analysis of High-density Oligonucleotide Array Data: Background, Normalization and Summarization , 2003 .

[24]  John E. Hunter,et al.  Fixed Effects vs. Random Effects Meta‐Analysis Models: Implications for Cumulative Research Knowledge , 2000 .

[25]  Cristina Mitrea,et al.  A novel bi-level meta-analysis approach: applied to biological pathway analysis , 2016, Bioinform..

[26]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[27]  Cristina Mitrea,et al.  DANUBE: Data-Driven Meta-ANalysis Using UnBiased Empirical Distributions—Applied to Biological Pathway Analysis , 2017, Proceedings of the IEEE.

[28]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[29]  R V Jensen,et al.  Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Wolfgang Viechtbauer,et al.  Conducting Meta-Analyses in R with the metafor Package , 2010 .

[31]  Dallas E. Johnson,et al.  Analysis of Messy Data Volume 1: Designed Experiments, Second Edition , 2004 .

[32]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  B WILKINSON,et al.  A statistical consideration in psychological research. , 1951, Psychological bulletin.

[34]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[35]  Burkhard Morgenstern,et al.  Meta-Analysis of Pathway Enrichment: Combining Independent and Dependent Omics Data Sets , 2014, PloS one.

[36]  HighWire Press,et al.  American journal of physiology. Cell physiology , 1977 .

[37]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[39]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[40]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[41]  J. O. Irwin,et al.  ON THE FREQUENCY DISTRIBUTION OF THE MEANS OF SAMPLES FROM A POPULATION HAVING ANY LAW OF FREQUENCY WITH FINITE MOMENTS, WITH SPECIAL REFERENCE TO PEARSON'S TYPE II , 1927 .

[42]  Wolfgang Viechtbauer,et al.  Bias and Efficiency of Meta-Analytic Variance Estimators in the Random-Effects Model , 2005 .

[43]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[44]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[45]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[46]  F RANKLIN H. E PSTEIN,et al.  Mechanisms of Disease Mechanisms of Disease a Therosclerosis — a N I Nflammatory D Isease a Factors That Induce and Promote Inflammation or Atherogenesis , 2022 .

[47]  Josef Schmee,et al.  Analysis of Messy Data, Volume I: Designed Experiments , 1985 .

[48]  R. Swerdlow,et al.  Brain aging, Alzheimer's disease, and mitochondria. , 2011, Biochimica et biophysica acta.

[49]  Marie Kelly-Worden,et al.  Mitochondrial Dysfunction and Alzheimer’s Disease , 2013 .

[50]  S. Drăghici,et al.  Analysis and correction of crosstalk effects in pathway analysis , 2013, Genome research.

[51]  E. Suchman,et al.  The American soldier: Adjustment during army life. (Studies in social psychology in World War II), Vol. 1 , 1949 .

[52]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[53]  Hall Philip,et al.  THE DISTRIBUTION OF MEANS FOR SAMPLES OF SIZE N DRAWN FROM A POPULATION IN WHICH THE VARIATE TAKES VALUES BETWEEN 0 AND 1, ALL SUCH VALUES BEING EQUALLY PROBABLE , 1927 .