Aggregating Homologous Protein Families in Evolutionary Reconstructions of Herpesviruses

Protein families can be used to reconstruct evolutionary histories of organisms. The accuracy of protein assignment to such families is critical for the success of such studies. Here we investigate the automatic aggregation of motif-defined homologous protein families for further reconstruction of their evolutionary histories. We propose a method that utilises only parameters that can be adjusted by using the data. The building blocks of the method include: (a) a majority rule for combining protein homologous neighbourhood lists into that for a family, and (b) a robust clustering procedure whose only parameter, the similarity shift, can be estimated from information on proteins with known function. The method is applied to a herpesvirus protein dataset leading to insights into the composition of ancestors of herpesvirus superfamilies. Comparison of the computational reconstructions with more comprehensive analyses also show how alignment-based between-protein similarity scoring can be improved by using data on gene arrangements

[1]  B. Mirkin Additive clustering and qualitative factor analysis methods for similarity matrices , 1989 .

[2]  Jérôme Gouzy,et al.  XDOM, a graphical tool to analyse domain arrangements in any set of protein sequences , 1997, Comput. Appl. Biosci..

[3]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[4]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[5]  D. McGeoch,et al.  Toward a Comprehensive Phylogeny for Mammalian and Avian Herpesviruses , 2000, Journal of Virology.

[6]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[7]  Eugene V. Koonin,et al.  A top-down method for building genome classification trees with linear binary hierarchies , 2001, Bioconsensus.

[8]  Frances M. G. Pearl,et al.  VIDA: a virus database system for the organization of animal virus genome open reading frames , 2001, Nucleic Acids Res..

[9]  C A Orengo,et al.  Genomewide function conservation and phylogeny in the Herpesviridae. , 2001, Genome research.

[10]  B. Snel,et al.  Genomes in flux: the evolution of archaeal and proteobacterial gene content. , 2002, Genome research.

[11]  Andrew J Davison,et al.  Evolution of the herpesviruses. , 2002, Veterinary microbiology.

[12]  Paul Kellam,et al.  Identification of new herpesvirus gene homologs in the human genome. , 2002, Genome research.

[13]  Andrew J Davison,et al.  Fundamental and accessory systems in herpesviruses. , 2002, Antiviral research.

[14]  Michael Y. Galperin,et al.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes , 2003, BMC Evolutionary Biology.

[15]  Christos A. Ouzounis,et al.  GeneTRACE - Reconstruction of Gene Content of Ancestral Species , 2003, Bioinform..

[16]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .