On application of directons to functional classification of genes in prokaryotes

Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene functional annotations. We have developed a method to integrate genomic neighborhood information of genes with their sequence similarity information for the functional classification of prokaryotic genes. The application of our method to 93 proteobacterial genomes has shown that (i) the genomic neighborhoods are much more conserved across prokaryotic genomes than expected by chance, and such conservation can be utilized to improve functional classification of genes; (ii) while our method is consistent with the existing popular schemes as much as they are among themselves, it does provide functional classification at higher resolution and hence allows functional assignments of (new) genes at a more specific level; and (iii) our method is fairly stable when being applied to different genomes.

[1]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[2]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[3]  Ying Xu,et al.  Accurate prediction of orthologous gene groups in microbes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[4]  Matthew R. Laird,et al.  BMC Bioinformatics BioMed Central Methodology article Improving the specificity of high-throughput ortholog prediction , 2006 .

[5]  Shmuel Pietrokovski,et al.  The Blocks database--a system for protein classification , 1996, Nucleic Acids Res..

[6]  M. Schmid,et al.  Sequence of a gene cluster from Klebsiella pneumoniae encoding malonate decarboxylase and expression of the enzyme in Escherichia coli. , 1997, European journal of biochemistry.

[7]  J. Peter Gogarten,et al.  BranchClust: a phylogenetic algorithm for selecting gene families , 2007, BMC Bioinformatics.

[8]  H. Guy,et al.  Evolutionary relationship between K(+) channels and symporters. , 1999, Biophysical journal.

[9]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[10]  Mark D'Souza,et al.  Use of contiguity on the chromosome to predict functional coupling , 1998, Silico Biol..

[11]  E. Koonin,et al.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. , 2001, Genome research.

[12]  P. Bork,et al.  Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs , 2004, Nature Biotechnology.

[13]  Julio Collado-Vides,et al.  Operon conservation from the point of view of Escherichia coli, and inference of functional inter-dependence of gene products from genome context , 2002, Silico Biol..

[14]  S. Dongen Graph clustering by flow simulation , 2000 .

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Tao Jiang,et al.  Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. , 2004, Nucleic acids research.

[17]  W. Epstein,et al.  The roles and regulation of potassium in bacteria. , 2003, Progress in nucleic acid research and molecular biology.

[18]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[19]  Ying Xu,et al.  Mapping of orthologous genes in the context of biological pathways: An application of integer programming , 2006, Proc. Natl. Acad. Sci. USA.

[20]  Gloria M. Coruzzi,et al.  OrthologID: automation of genome-scale ortholog identification within a parsimony framework , 2006, Bioinform..

[21]  Hongwei Wu,et al.  Hierarchical classification of functionally equivalent genes in prokaryotes , 2007, Nucleic Acids Research.

[22]  Minoru Kanehisa,et al.  The KEGG database. , 2002, Novartis Foundation symposium.

[23]  Guy Perrière,et al.  Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases , 2005, Bioinform..

[24]  Peer Bork,et al.  Functional clues for hypothetical proteins based on genomic context analysis in prokaryotes. , 2004, Nucleic acids research.

[25]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[26]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[27]  Hongwei Wu,et al.  Detecting uber-operons in prokaryotic genomes , 2006, Nucleic acids research.

[28]  P Bork,et al.  Exploitation of gene context. , 2000, Current opinion in structural biology.

[29]  Julio Collado-Vides,et al.  Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons , 2005, Nucleic acids research.

[30]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[31]  Liming Cai,et al.  Comparative Pathway Annotation with Protein-DNA Interaction and Operon Information via Graph Tree Decomposition , 2006, Pacific Symposium on Biocomputing.

[32]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[33]  D. Sackett Evidence-Based Medicine: How to Practice and Teach EBM , 2018 .

[34]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[35]  Paramvir S. Dehal,et al.  A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database , 2006, BMC Bioinformatics.

[36]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[37]  A. E. Hirsh,et al.  Protein dispensability and rate of evolution , 2001, Nature.

[38]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.