A hybrid gene team model and its application to genome analysis.

It is well-known that functionally related genes occur in a physically clustered form, especially operons in bacteria. By leveraging on this fact, there has recently been an interesting problem formulation known as gene team model, which searches for a set of genes that co-occur in a pair of closely related genomes. However, many gene teams, even experimentally verified operons, frequently scatter within other genomes. Thus, the gene team model should be refined to reflect this observation. In this paper, we generalized the gene team model, that looks for gene clusters in a physically clustered form, to multiple genome cases with relaxed constraints. We propose a novel hybrid pattern model that combines the set and the sequential pattern models. Our model searches for gene clusters with and/or without physical proximity constraint. This model is implemented and tested with 97 genomes (120 replicons). The result was analyzed to show the usefulness of our model. We also compared the result from our hybrid model to those from the traditional gene team model. We also show that predicted gene teams can be used for various genome analysis: operon prediction, phylogenetic analysis of organisms, contextual sequence analysis and genome annotation. Our program is fast enough to provide a service on the web at http://platcom.informatics.indiana.edu/platcom/. Users can select any combination of 97 genomes to predict gene teams.

[1]  Hwan-Gue Cho,et al.  PhyloDraw: a phylogenetic tree drawing system , 2000, Bioinform..

[2]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[3]  Javier Tamames,et al.  Evolution of gene order conservation in prokaryotes , 2001, Genome Biology.

[4]  I. Connerton,et al.  Bacillus subtilis genes for the utilization of sulfur from aliphatic sulfonates. , 1998, Microbiology.

[5]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[6]  G. Mitchison,et al.  Making family trees from gene families , 1999, Nature Genetics.

[7]  Mathieu Raffinot,et al.  The Algorithmic of Gene Teams , 2002, WABI.

[8]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[10]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[11]  A. Farewell,et al.  The Bacillus subtilis glpD leader and antiterminator protein GlpP provide a target for glucose repression in Escherichia coli. , 1998, FEMS microbiology letters.

[12]  H. Mori,et al.  Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. , 1999, Molecular biology and evolution.

[13]  Yu Ma,et al.  PLATCOM: a Platform for Computational Comparative Genomics , 2005, Bioinform..

[14]  Xin He,et al.  Identifying conserved gene clusters in the presence of orthologous groups , 2004, RECOMB '04.

[15]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[16]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[17]  P. Piggot,et al.  The dacF-spoIIA operon of Bacillus subtilis, encoding sigma F, is autoregulated , 1994, Journal of bacteriology.

[18]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[19]  Enrique Merino,et al.  GeConT: gene context analysis , 2004, Bioinform..