Mining biologically active patterns in metabolic pathways using microarray expression profiles

We present a new probabilistic framework for analyzing a metabolic pathway with microarray expression profiles. Our purpose is to find biologically significant paths and patterns in a given metabolic pathway. Our approach first builds a Markov model using a graph structure of a known metabolic pathway, and then estimates parameters of a mixture of the Markov models using microarray data, based on an EM algorithm. In our experiments, we used a main pathway of glycolysis to evaluate the effectiveness of our method. We first measured the performance of our method comparing with that of another method, in a supervised learning manner, and found that our method significantly outperformed another method, which was trained by microarray data only. We further analyzed the trained models and obtained a number of new biological findings on frequent patterns (paths) and long-range correlations in a metabolic pathway.

[1]  Costas D. Maranas,et al.  Review of the Enzymes and Metabolic Pathways (EMP) Database , 2001 .

[2]  J. Cornell Experiments with Mixtures: Designs, Models and the Analysis of Mixture Data , 1982 .

[3]  Thomas Lengauer,et al.  Analysis of Gene Expression Data with Pathway Scores , 2000, ISMB.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[6]  Laura L. Newcomb,et al.  Glucose Regulation of Saccharomyces cerevisiae Cell Cycle Genes , 2003, Eukaryotic Cell.

[7]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[8]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[9]  Dimitris J. Bertsimas,et al.  Dynamic Classification of Online Customers , 2003, SDM.

[10]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[11]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[12]  Peter D. Karp,et al.  The EcoCyc Database , 2002, Nucleic Acids Res..

[13]  Inderjit S. Dhillon,et al.  Diametrical clustering for identifying anti-correlated gene clusters , 2003, Bioinform..

[14]  Chi-Huey Wong,et al.  Enzymes for chemical synthesis , 2001, Nature.

[15]  Pierre Baldi,et al.  Modeling the Internet and the Web: Probabilistic Methods and Algorithms. By Pierre Baldi, Paolo Frasconi, Padhraic Smith, John Wiley and Sons Ltd., West Sussex, England, 2003. 285 pp ISBN 0 470 84906 1 , 2006, Inf. Process. Manag..

[16]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[17]  Edward R. Dougherty,et al.  From Boolean to probabilistic Boolean networks as models of genetic regulatory networks , 2002, Proc. IEEE.

[18]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[19]  A. Arkin,et al.  Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. , 1998, Genetics.

[20]  Katy C. Kao,et al.  Global Expression Profiling of Acetate-grown Escherichia coli * , 2002, The Journal of Biological Chemistry.

[21]  H. Westerhoff,et al.  Transcriptome meets metabolome: hierarchical and metabolic regulation of the glycolytic pathway , 2001, FEBS letters.

[22]  Barbara M. Bakker,et al.  Can yeast glycolysis be understood in terms of in vitro kinetics of the constituent enzymes? Testing biochemistry. , 2000, European journal of biochemistry.

[23]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[26]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[27]  David Page,et al.  Modelling regulatory pathways in E. coli from time series expression profiles , 2002, ISMB.

[28]  Padhraic Smyth,et al.  Model-Based Clustering and Visualization of Navigation Patterns on a Web Site , 2003, Data Mining and Knowledge Discovery.

[29]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[30]  Lynda B. M. Ellis,et al.  The University of Minnesota Biocatalysis/Biodegradation Database: post-genomic data mining , 2003, Nucleic Acids Res..

[31]  B. Palsson,et al.  Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. , 2003, Genome research.