Network Enrichment Analysis in Complex Experiments

Cellular functions of living organisms are carried out through complex systems of interacting components. Including such interactions in the analysis, and considering sub-systems defined by biological pathways instead of individual components (e.g. genes), can lead to new findings about complex biological mechanisms. Networks are often used to capture such interactions and can be incorporated in models to improve the efficiency in estimation and inference. In this paper, we propose a model for incorporating external information about interactions among genes (proteins/metabolites) in differential analysis of gene sets. We exploit the framework of mixed linear models and propose a flexible inference procedure for analysis of changes in biological pathways. The proposed method facilitates the analysis of complex experiments, including multiple experimental conditions and temporal correlations among observations. We propose an efficient iterative algorithm for estimation of the model parameters and show that the proposed framework is asymptotically robust to the presence of noise in the network information. The performance of the proposed model is illustrated through the analysis of gene expression data for environmental stress response (ESR) in yeast, as well as simulated data sets.

[1]  Jan Kmenta,et al.  A General Procedure for Obtaining Maximum Likelihood Estimates in Generalized Regression Models , 1974 .

[2]  L. Mirsky A trace inequality of John von Neumann , 1975 .

[3]  D. Bates,et al.  Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data , 1988 .

[4]  S. Haberman Concavity and estimation , 1989 .

[5]  Jan de Leeuw,et al.  Block-relaxation Algorithms in Statistics , 1994 .

[6]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[7]  Michael I. Jordan Graphical Models , 2003 .

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[10]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[11]  E. Lander,et al.  Remodeling of yeast genome expression in response to environmental changes. , 2001, Molecular biology of the cell.

[12]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2001, Genetical research.

[13]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[14]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[15]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[16]  Margaret Werner-Washburne,et al.  The genomics of yeast responses to environmental stress and starvation , 2002, Functional & Integrative Genomics.

[17]  D. Sengupta Linear models , 2003 .

[18]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[19]  Thomas Lengauer,et al.  Statistical Applications in Genetics and Molecular Biology Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data , 2011 .

[20]  Jon C. Dattorro,et al.  Convex Optimization & Euclidean Distance Geometry , 2004 .

[21]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[24]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[25]  F Hong,et al.  Functional Hierarchical Models for Identifying Genes with Different Time‐Course Expression Profiles , 2006, Biometrics.

[26]  Hiroshi Mamitsuka,et al.  A hidden Markov model-based approach for identifying timing differences in gene expression under different experimental factors , 2007, Bioinform..

[27]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[28]  T. Richardson,et al.  Estimation of a covariance matrix with zeros , 2005, math/0508268.

[29]  E. Marcotte,et al.  An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae , 2007, PloS one.

[30]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[31]  Jim Hefferon,et al.  Linear Algebra , 2012 .

[32]  Wei Pan,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm612 Systems biology , 2022 .

[33]  Hongzhe Li,et al.  A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data , 2008, 0803.3942.

[34]  Guido Sanguinetti,et al.  MMG: a probabilistic tool to identify submodules of metabolic pathways , 2008, Bioinform..

[35]  Ali Shojaie,et al.  Analysis of Gene Sets Based on the Underlying Regulatory Network , 2009, J. Comput. Biol..

[36]  Albert D. Shieh,et al.  Statistical Applications in Genetics and Molecular Biology , 2010 .

[37]  Ali Shojaie,et al.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. , 2009, Biometrika.

[38]  U. Feige,et al.  Spectral Graph Theory , 2015 .