Inference of Gene Pathways Using Gaussian Mixture Models

Identification of gene-gene interactions and complete characterization of gene pathways are critical in understanding the transcript processes underlying biological processes. Bayesian network is a powerful framework to infer gene pathways. We developed a novel Bayesian network, in which we use Gaussian mixture models to describe continuous gene expression data and learn gene pathways. Mixture parameters were estimated using an EM algorithm, while the optimal number of mixture component for each gene node and the optimal network topology best supported by the data were identified using the Bayesian Information criterion (BIC). We applied the proposed approach to a histone pathway in yeast and to a less explored circadian rhythm pathway in honeybee. The performance of the proposed approach was compared against alternative Bayesian network algorithms that either discretize the gene expression information or use single distribution instead of mixtures. Evaluation shows that our approach outperforms other approaches in terms of more accurate inference of the known network and can effectively predict gene pathways with different topology using continuous data. In addition, the estimated mixture model can facilitate an intuitive description of the gene node behavior, thus enhancing the interpretation of the inferred network.

[1]  P. Koehl,et al.  Protein structure similarities. , 2001, Current opinion in structural biology.

[2]  C. Orengo Classification of protein folds , 1994 .

[3]  Tommi S. Jaakkola,et al.  Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models , 2001, Pacific Symposium on Biocomputing.

[4]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[5]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.

[6]  William R. Taylor,et al.  A Protein Structure Comparison Methodology , 1996, Comput. Chem..

[7]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[8]  Satoru Miyano,et al.  Finding Optimal Models for Small Gene Networks , 2003 .

[9]  Lorenz Wernisch,et al.  Reconstruction of gene networks using Bayesian learning and manipulation experiments , 2004, Bioinform..

[10]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[11]  Ming-Jing Hwang,et al.  Protein structure comparison by probability-based matching of secondary structure elements , 2003, Bioinform..

[12]  Zhiping Weng,et al.  FAST: A novel protein structure alignment algorithm , 2004, Proteins.

[13]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[14]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[15]  Yaoqi Zhou,et al.  Protein flexibility prediction by an all‐atom mean‐field statistical theory , 2005, Protein science : a publication of the Protein Society.

[16]  Xinkun Wang,et al.  An effective structure learning method for constructing gene networks , 2006, Bioinform..

[17]  Zhengyou Zhang,et al.  Iterative point matching for registration of free-form curves and surfaces , 1994, International Journal of Computer Vision.

[18]  Yuan-Fang Wang,et al.  CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[19]  C. Sander,et al.  Detection of common three‐dimensional substructures in proteins , 1991, Proteins.

[20]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[21]  Aurélien Mazurie,et al.  Gene networks inference using dynamic Bayesian networks , 2003, ECCB.

[22]  W. Taylor Protein structure comparison using iterated double dynamic programming , 2008, Protein science : a publication of the Protein Society.

[23]  Ming Ouhyoung,et al.  A tool for structure alignment of molecules , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[24]  V. Anne Smith,et al.  Evaluating functional network inference using simulations of complex biological systems , 2002, ISMB.

[25]  Liviu Badea,et al.  Determining the Direction of Causal Influence in Large Probabilistic Networks: A Constraint-Based Approach , 2004, ECAI.

[26]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[27]  Gene E Robinson,et al.  Semiparametric approach to characterize unique gene expression trajectories across time , 2006, BMC Genomics.

[28]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[29]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[30]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[31]  Andrew W. Moore,et al.  Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables , 2000, UAI.

[32]  M. Sippl,et al.  ProSup: a refined tool for protein structure alignment. , 2000, Protein engineering.

[33]  M. Bowman,et al.  Structure-guided programming of polyketide chain-length determination in chalcone synthase. , 2001, Biochemistry.

[34]  Jesper Tegnér,et al.  Growing Bayesian network models of gene networks from seed genes , 2005, ECCB/JBI.

[35]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[36]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[37]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[38]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[39]  Haim J. Wolfson,et al.  Geometric hashing: an overview , 1997 .

[40]  Xin Yuan,et al.  Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins , 2005, Bioinform..