An exploratory data analysis method to reveal modular latent structures in high-throughput data

BackgroundModular structures are ubiquitous across various types of biological networks. The study of network modularity can help reveal regulatory mechanisms in systems biology, evolutionary biology and developmental biology. Identifying putative modular latent structures from high-throughput data using exploratory analysis can help better interpret the data and generate new hypotheses. Unsupervised learning methods designed for global dimension reduction or clustering fall short of identifying modules with factors acting in linear combinations.ResultsWe present an exploratory data analysis method named MLSA (Modular Latent Structure Analysis) to estimate modular latent structures, which can find co-regulative modules that involve non-coexpressive genes.ConclusionsThrough simulations and real-data analyses, we show that the method can recover modular latent structures effectively. In addition, the method also performed very well on data generated from sparse global latent factor models. The R code is available at http://userwww.service.emory.edu/~tyu8/MLSA/.

[1]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[2]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[3]  Tianwei Yu,et al.  Inference of transcriptional regulatory network by two-stage constrained space factor analysis , 2005, Bioinform..

[4]  Jun S Liu,et al.  Bayesian biclustering of gene expression data , 2008, BMC Genomics.

[5]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Yi Zhang,et al.  Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung. , 2006, Cancer research.

[8]  G. Stormo,et al.  Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. , 2002, Genome research.

[9]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[10]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[11]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[12]  H. Gunshin,et al.  A review of independent component analysis application to microarray gene expression data. , 2008, BioTechniques.

[13]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[14]  G. Wagner,et al.  The road to modularity , 2007, Nature Reviews Genetics.

[15]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[16]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[17]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[20]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[21]  Robert I. Jennrich,et al.  Gradient Projection Algorithms and Software for Arbitrary Rotation Criteria in Factor Analysis , 2005 .

[22]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[23]  Marc S Halfon,et al.  Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. , 2002, Genome research.

[24]  Ker-Chau Li,et al.  Context-dependent clustering for dynamic cellular state modeling of microarray gene expression , 2007, Bioinform..

[25]  Jae K. Lee,et al.  A strategy for predicting the chemosensitivity of human cancers and its application to drug discovery , 2007, Proceedings of the National Academy of Sciences.

[26]  Frank J. Manion,et al.  Application of Bayesian Decomposition for analysing microarray data , 2002, Bioinform..

[27]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[28]  Ming Yan,et al.  A simple statistical model for depicting the cdc-15 synchronized yeast cell cycle-regulated gene expression data , 2002 .

[29]  D. Koller,et al.  From signatures to models: understanding cancer using microarrays , 2005, Nature Genetics.

[30]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[31]  Kyongbum Lee,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm374 Systems biology Modular decomposition of metabolic reaction networks based on flux analysis and pathway projection , 2022 .