kGC: Finding Groups of Homologous Genes across Multiple Genomes

We present a simple method to obtain groups of homologous genes across multiple (k) organisms, called kGC. It takes all-againstall BLASTP comparisons as input and produces groups of homologous sequences as output. The algorithm is based on the identification of maximal cliques in graphs of sequences and paralogous groups. We have used our method on six Actinobacterial complete genomes and investigated the Pfam classification of the homologous groups with respect to the results produced by OrthoMCL. Although kGC is simpler, it presented similar results with respect to Pfam classification in reasonable time.

[1]  Bonnie Berger,et al.  Methods in Comparative Genomics: Genome Correspondence, Gene Identification and Regulatory Motif Discovery , 2004, J. Comput. Biol..

[2]  Bonnie Berger,et al.  Methods in comparative genomics: genome correspondence, gene identification and motif discovery , 2003 .

[3]  Gang Liu,et al.  Automatic clustering of orthologs and inparalogs shared by multiple proteomes , 2006, ISMB.

[4]  Carlos Eduardo Ferreira,et al.  Advances in Bioinformatics and Computational Biology, 5th Brazilian Symposium on Bioinformatics, BSB 2010, Rio de Janeiro, Brazil, August 31-September 3, 2010. Proceedings , 2010, BSB.

[5]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[6]  G. Pertea,et al.  Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). , 2002, Genome research.

[7]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[8]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[9]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[10]  Nevin D. Young,et al.  OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies , 2003, BMC Bioinformatics.

[11]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[12]  Guilherme P. Telles,et al.  A Method for Inferring Biological Functions Using Homologous Genes Among Three Genomes , 2007, BSB.