A framework of integrating gene relations from heterogeneous data sources: an experiment on Arabidopsis thaliana

One of the most important goals of biological investigation is to uncover gene functional relations. In this study we propose a framework for extraction and integration of gene functional relations from diverse biological data sources, including gene expression data, biological literature and genomic sequence information. We introduce a two-layered Bayesian network approach to integrate relations from multiple sources into a genome-wide functional network. An experimental study was conducted on a test-bed of Arabidopsis thaliana. Evaluation of the integrated network demonstrated that relation integration could improve the reliability of relations by combining evidence from different data sources. Domain expert judgments on the gene functional clusters in the network confirmed the validity of our approach for relation integration and network inference.

[1]  T. Sun,et al.  The Arabidopsis GA1 locus encodes the cyclase ent-kaurene synthetase A of gibberellin biosynthesis. , 1994, The Plant cell.

[2]  Y. Kamiya,et al.  Gibberellin dose-response regulation of GA4 gene transcript levels in Arabidopsis. , 1998, Plant physiology.

[3]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[4]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[5]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  C. Helliwell,et al.  Arabidopsis ent-kaurene oxidase catalyzes three steps of gibberellin biosynthesis. , 1999, Plant physiology.

[7]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[8]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[9]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[10]  C. DeLisi,et al.  The society of genes: networks of functional links between genes from comparative genomics , 2002, Genome Biology.

[11]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[12]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[13]  S. Rhee,et al.  AraCyc: A Biochemical Pathway Database for Arabidopsis1 , 2003, Plant Physiology.

[14]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[16]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[17]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[18]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[19]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[20]  M. Gerstein,et al.  Integration of genomic datasets to predict protein complexes in yeast , 2004, Journal of Structural and Functional Genomics.

[21]  Robert Turgeon,et al.  Graft Transmission of a Floral Stimulant Derived from CONSTANS1 , 2004, Plant Physiology.

[22]  Hsinchun Chen,et al.  Extracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser , 2004, Bioinform..

[23]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[24]  François Parcy,et al.  The mRNA of the Arabidopsis Gene FT Moves from Leaf to Shoot Apex and Induces Flowering , 2005, Science.

[25]  Minsoo Kim,et al.  Analysis of flowering pathway integrators in Arabidopsis. , 2005, Plant & cell physiology.

[26]  Hsinchun Chen,et al.  Aggregating automatically extracted regulatory pathway relations , 2006, IEEE Transactions on Information Technology in Biomedicine.

[27]  Hsinchun Chen,et al.  Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining , 2007, Decis. Support Syst..