A Bayesian Framework for the Classification of Microbial Gene Activity States

Numerous methods for classifying gene activity states based on gene expression data have been proposed for use in downstream applications, such as incorporating transcriptomics data into metabolic models in order to improve resulting flux predictions. These methods often attempt to classify gene activity for each gene in each experimental condition as belonging to one of two states: active (the gene product is part of an active cellular mechanism) or inactive (the cellular mechanism is not active). These existing methods of classifying gene activity states suffer from multiple limitations, including enforcing unrealistic constraints on the overall proportions of active and inactive genes, failing to leverage a priori knowledge of gene co-regulation, failing to account for differences between genes, and failing to provide statistically meaningful confidence estimates. We propose a flexible Bayesian approach to classifying gene activity states based on a Gaussian mixture model. The model integrates genome-wide transcriptomics data from multiple conditions and information about gene co-regulation to provide activity state confidence estimates for each gene in each condition. We compare the performance of our novel method to existing methods on both simulated data and real data from 907 E. coli gene expression arrays, as well as a comparison with experimentally measured flux values in 29 conditions, demonstrating that our method provides more consistent and accurate results than existing methods across a variety of metrics.

[1]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[2]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[3]  B. Palsson,et al.  Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods , 2012, Nature Reviews Microbiology.

[4]  Bernhard O. Palsson,et al.  Context-Specific Metabolic Networks Are Consistent with Experiments , 2008, PLoS Comput. Biol..

[5]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[6]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[7]  Jaques Reifman,et al.  Modeling Phenotypic Metabolic Adaptations of Mycobacterium tuberculosis H37Rv under Hypoxia , 2012, PLoS Comput. Biol..

[8]  Pia Abel zur Wiesch,et al.  Bi-modal Distribution of the Second Messenger c-di-GMP Controls Cell Fate and Asymmetry during the Caulobacter Cell Cycle , 2013, PLoS genetics.

[9]  Adrian E. Raftery,et al.  Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[10]  L. Aravind,et al.  Interplay between gene expression noise and regulatory network architecture. , 2012, Trends in genetics : TIG.

[11]  Jessica Andrea Carballido,et al.  Discretization of gene expression data revised , 2016, Briefings Bioinform..

[12]  Marcel J. T. Reinders,et al.  Predicting Metabolic Fluxes Using Gene Expression Differences As Constraints , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Michael C. Jewett,et al.  Linking high-resolution metabolic flux phenotypes and transcriptional regulation in yeast modulated by the global regulator Gcn4p , 2009, Proceedings of the National Academy of Sciences.

[14]  Matthew DeJongh,et al.  Gene set analyses for interpreting microarray experiments on prokaryotic organisms , 2008, BMC Bioinformatics.

[15]  Sofia Morfopoulou,et al.  Bayesian mixture analysis for metagenomic community profiling , 2014, bioRxiv.

[16]  Jeremiah J. Faith,et al.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata , 2007, Nucleic Acids Res..

[17]  Pei Yee Ho,et al.  Multiple High-Throughput Analyses Monitor the Response of E. coli to Perturbations , 2007, Science.

[18]  Adam M. Feist,et al.  A comprehensive genome-scale reconstruction of Escherichia coli metabolism—2011 , 2011, Molecular systems biology.

[19]  Bernhard O. Palsson,et al.  GIM3E: condition-specific models of cellular metabolism developed from metabolomics and expression data , 2013, Bioinform..

[20]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[21]  E. Ruppin,et al.  Predicting Drug Targets and Biomarkers of Cancer via Genome-Scale Metabolic Modeling , 2012, Clinical Cancer Research.

[22]  Jason A. Papin,et al.  Functional integration of a metabolic network model and expression data without arbitrary thresholding , 2011, Bioinform..

[23]  Claude Desplan,et al.  Stochasticity and Cell Fate , 2008, Science.

[24]  John Gould,et al.  Toward the automated generation of genome-scale metabolic networks in the SEED , 2007, BMC Bioinformatics.

[25]  Scott Powers,et al.  Cautions about the reliability of pairwise gene correlations based on expression data , 2015, Front. Microbiol..

[26]  J. Reed,et al.  RELATCH: relative optimality in metabolic networks explains robust metabolic and regulatory responses to perturbations , 2012, Genome Biology.

[27]  Matthew DeJongh,et al.  Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data , 2011, BMC Bioinformatics.

[28]  Ali Navid,et al.  Genome-level transcription data of Yersinia pestis analyzed with a New metabolic constraint-based approach , 2012, BMC Systems Biology.

[29]  Markus J. Herrgård,et al.  Network-based prediction of human tissue-specific metabolism , 2008, Nature Biotechnology.

[30]  L. Hamoen,et al.  A Novel Feedback Loop That Controls Bimodal Expression of Genetic Competence , 2015, PLoS genetics.

[31]  Neil Swainston,et al.  Improving metabolic flux predictions using absolute gene expression data , 2012, BMC Systems Biology.

[32]  U. Sauer,et al.  Coordination of microbial metabolism , 2014, Nature Reviews Microbiology.

[33]  Zachary A. King,et al.  Constraint-based models predict metabolic and associated cellular functions , 2014, Nature Reviews Genetics.

[34]  Desmond S. Lun,et al.  Interpreting Expression Data with Metabolic Flux Models: Predicting Mycobacterium tuberculosis Mycolic Acid Production , 2009, PLoS Comput. Biol..

[35]  Katherine H. Huang,et al.  A novel method for accurate operon predictions in all sequenced prokaryotes , 2005, Nucleic acids research.

[36]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[37]  Bernhard O. Palsson,et al.  Optimizing genome-scale network reconstructions , 2014, Nature Biotechnology.

[38]  O. Ebenhöh,et al.  Systems approaches to modelling pathways and networks. , 2011, Briefings in functional genomics.

[39]  J. Ferrell Self-perpetuating states in signal transduction: positive feedback, double-negative feedback and bistability. , 2002, Current opinion in cell biology.

[40]  Daniel Machado,et al.  Systematic Evaluation of Methods for Integration of Transcriptomic Data into Constraint-Based Models of Metabolism , 2014, PLoS Comput. Biol..

[41]  R. Schleif Regulation of the L-arabinose operon of Escherichia coli. , 2000, Trends in genetics : TIG.

[42]  R. Mahadevan,et al.  The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. , 2003, Metabolic engineering.

[43]  Angel Rubio,et al.  Advances in network-based metabolic pathway analysis and gene expression data integration , 2015, Briefings Bioinform..

[44]  Robert Schleif,et al.  AraC protein, regulation of the l-arabinose operon in Escherichia coli, and the light switch mechanism of AraC action. , 2010, FEMS microbiology reviews.

[45]  N. Price,et al.  Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis , 2010, Proceedings of the National Academy of Sciences.

[46]  Jason A. Papin,et al.  TIGER: Toolbox for integrating genome-scale metabolic models, expression data, and transcriptional regulatory networks , 2011, BMC Systems Biology.

[47]  Tong Wang,et al.  A Novel Method , 2020, ArXiv.

[48]  Kenichi Satoh,et al.  A robust method for estimating gene expression states using Affymetrix microarray probe level data , 2010, BMC Bioinformatics.