Clustering Proteins and Reconstructing Evolutionary Events

The issue of clustering proteins into homologous families has attracted considerable attention by researchers. On one side, many databases of protein families have been developed by using relatively simple clustering methods and a lot of manual curation. On the other side, more elaborated clustering approaches have been used, yet with a very limited degree of success. This paper advocates an approach to clustering protein families involving the knowledge of protein functions to adjust the parameter of similarity scale shift. We proceed to reconstruct HPF evolutionary histories to both further narrow down the choice of the cluster solution and interpret clusters.

[1]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[2]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[3]  Frances M. G. Pearl,et al.  VIDA: a virus database system for the organization of animal virus genome open reading frames , 2001, Nucleic Acids Res..

[4]  R. Jarvis,et al.  ClusteringUsing a Similarity Measure Based on SharedNear Neighbors , 1973 .

[5]  Trevor I. Fenner,et al.  Aggregating Homologous Protein Families in Evolutionary Reconstructions of Herpesviruses , 2006, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[6]  J. Peter Gogarten,et al.  BranchClust: a phylogenetic algorithm for selecting gene families , 2007, BMC Bioinformatics.

[7]  M. Coote,et al.  A universal approach for continuum solvent pKa calculations: are we there yet? , 2009 .

[8]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[9]  B. Mirkin Additive clustering and qualitative factor analysis methods for similarity matrices , 1987 .

[10]  Yoichi Takenaka,et al.  Graph-based clustering for finding distant relationships in a large set of protein sequences , 2004, Bioinform..

[11]  Andrew J Davison,et al.  Evolution of the herpesviruses. , 2002, Veterinary microbiology.

[12]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[13]  Kevin D. Reilly,et al.  SEQOPTICS: a protein sequence clustering system , 2006, BMC Bioinformatics.

[14]  Michael Y. Galperin,et al.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes , 2003, BMC Evolutionary Biology.

[15]  James A. Casbon,et al.  Spectral clustering of protein sequences , 2006, Nucleic acids research.

[16]  George Loizou,et al.  Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus , 2010 .

[17]  Jérôme Gouzy,et al.  XDOM, a graphical tool to analyse domain arrangements in any set of protein sequences , 1997, Comput. Appl. Biosci..

[18]  Andrew J Davison,et al.  Topics in herpesvirus genomics and evolution. , 2006, Virus research.

[19]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[20]  Duncan P. Brown,et al.  Automated Protein Subfamily Identification and Classification , 2007, PLoS Comput. Biol..

[21]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..