Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model

BackgroundA living cell has a complex, hierarchically organized signaling system that encodes and assimilates diverse environmental and intracellular signals, and it further transmits signals that control cellular responses, including a tightly controlled transcriptional program. An important and yet challenging task in systems biology is to reconstruct cellular signaling system in a data-driven manner. In this study, we investigate the utility of deep hierarchical neural networks in learning and representing the hierarchical organization of yeast transcriptomic machinery.ResultsWe have designed a sparse autoencoder model consisting of a layer of observed variables and four layers of hidden variables. We applied the model to over a thousand of yeast microarrays to learn the encoding system of yeast transcriptomic machinery. After model selection, we evaluated whether the trained models captured biologically sensible information. We show that the latent variables in the first hidden layer correctly captured the signals of yeast transcription factors (TFs), obtaining a close to one-to-one mapping between latent variables and TFs. We further show that genes regulated by latent variables at higher hidden layers are often involved in a common biological process, and the hierarchical relationships between latent variables conform to existing knowledge. Finally, we show that information captured by the latent variables provide more abstract and concise representations of each microarray, enabling the identification of better separated clusters in comparison to gene-based representation.ConclusionsContemporary deep hierarchical latent variable models, such as the autoencoder, can be used to partially recover the organization of transcriptomic machinery.

[1]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[2]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[3]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[4]  Milos Hauskrecht,et al.  Modeling Cellular Processes with Variational Bayesian Cooperative Vector Quantizer , 2003, Pacific Symposium on Biocomputing.

[5]  E. Fraenkel,et al.  Integrating Proteomic, Transcriptional, and Interactome Data Reveals Hidden Components of Signaling and Regulatory Networks , 2009, Science Signaling.

[6]  Xinghua Lu,et al.  Trans-species learning of cellular signaling systems with bimodal deep belief networks , 2015, Bioinform..

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  Songjian Lu,et al.  Integrating Genome and Functional Genomics Data to Reveal Perturbed Signaling Pathways in Ovarian Cancers , 2012, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[9]  D. Posada,et al.  Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. , 2004, Systematic biology.

[10]  Jean Marie François,et al.  Mechanisms other than activation of the iron regulon account for the hyper-resistance to cobalt of a Saccharomyces cerevisiae strain obtained by evolutionary engineering. , 2013, Metallomics : integrated biometal science.

[11]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[12]  Aaron C. Courville,et al.  Understanding Representations Learned in Deep Architectures , 2010 .

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[15]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[16]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[17]  Songjian Lu,et al.  From Data towards Knowledge: Revealing the Architecture of Signaling Systems by Unifying Knowledge Mining and Data Mining of Systematic Perturbation Data , 2013, PloS one.

[18]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[19]  J. Douglas Armstrong,et al.  Merged consensus clustering to assess and improve class discovery with microarray data , 2010, BMC Bioinformatics.

[20]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[21]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[22]  Richard Treisman,et al.  Functional analysis of a growth factor-responsive transcription factor complex , 1993, Cell.

[23]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  D. Karger,et al.  Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity , 2009, Nature Genetics.

[25]  Xinghua Lu,et al.  Conceptualization of molecular findings by mining gene annotations , 2013, BMC Proceedings.

[26]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Matthieu Cord,et al.  Biasing Restricted Boltzmann Machines to Manipulate Latent Selectivity and Sparsity , 2010, NIPS 2010.

[28]  V. Longo,et al.  Regulation of Longevity and Stress Resistance by Sch9 in Yeast , 2001, Science.