MULTI-WAY BLOCKMODELS FOR ANALYZING COORDINATED HIGH-DIMENSIONAL RESPONSES.

We consider the problem of quantifying temporal coordination between multiple high-dimensional responses. We introduce a family of multi-way stochastic blockmodels suited for this problem, which avoids pre-processing steps such as binning and thresholding commonly adopted for this type of problems, in biology. We develop two inference procedures based on collapsed Gibbs sampling and variational methods. We provide a thorough evaluation of the proposed methods on simulated data, in terms of membership and blockmodel estimation, predictions out-of-sample, and run-time. We also quantify the effects of censoring procedures such as binning and thresholding on the estimation tasks. We use these models to carry out an empirical analysis of the functional mechanisms driving the coordination between gene expression and metabolite concentrations during carbon and nitrogen starvation, in S. cerevisiae.

[1]  Cyrille Joutard,et al.  Discovery of Latent Patterns with Hierarchical Bayesian Mixed-Membership Models and the Issue of Model Choice , 2008 .

[2]  Matthew J. Brauer,et al.  Coordination of growth rate, cell cycle, stress response, and metabolic activity in yeast. , 2008, Molecular biology of the cell.

[3]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[4]  Y. Vardi Empirical Distributions in Selection Bias Models , 1985 .

[5]  Matthew J. Brauer,et al.  Conservation of the metabolomic response to starvation across two divergent microbes , 2006, Proceedings of the National Academy of Sciences.

[6]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[7]  Giovanni Parmigiani,et al.  MergeMaid: R Tools for Merging and Cross-Study Validation of Gene Expression Data , 2004, Statistical applications in genetics and molecular biology.

[8]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[9]  C. Ball,et al.  Genetic and physical maps of Saccharomyces cerevisiae. , 1997, Nature.

[10]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[11]  Olga G. Troyanskaya,et al.  Coordinated Concentration Changes of Transcripts and Metabolites in Saccharomyces cerevisiae , 2009, PLoS Comput. Biol..

[12]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[13]  Edoardo M. Airoldi,et al.  Graphlet decomposition of a weighted network , 2012, AISTATS.

[14]  F. Markowetz,et al.  Systems-level dynamic analyses of fate change in murine embryonic stem cells , 2009, Nature.

[15]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[16]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[17]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[18]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[19]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[20]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[21]  Edoardo M. Airoldi,et al.  Getting Started in Probabilistic Graphical Models , 2007, PLoS Comput. Biol..

[22]  Edoardo M. Airoldi,et al.  Mapping Dynamic Histone Acetylation Patterns to Gene Expression in Nanog-Depleted Murine Embryonic Stem Cells , 2010, PLoS Comput. Biol..

[23]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[26]  D. Botstein,et al.  Monitoring Editor , 2011 .

[27]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[28]  Bin Yu,et al.  Co-clustering for directed graphs: the Stochastic co-Blockmodel and spectral algorithm Di-Sim , 2012, 1204.2296.

[29]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[30]  E. Airoldi,et al.  Estimating a Structured Covariance Matrix From Multilab Measurements in High-Throughput Biology , 2015, Journal of the American Statistical Association.

[31]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[32]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[33]  B. Turnbull The Empirical Distribution Function with Arbitrarily Grouped, Censored, and Truncated Data , 1976 .

[34]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[35]  H. Parry J , 2022, Edinburgh Medical and Surgical Journal.

[36]  Edoardo M. Airoldi,et al.  Predicting Cellular Growth from Gene Expression Signatures , 2009, PLoS Comput. Biol..