An automated phylogenetic key for classifying homeoboxes

When novel gene sequences are discovered, they are usually identified, classified, and annotated based on aggregate measures of sequence similarity. This method is prone to errors, however. Phylogenetic analysis is a more accurate basis for gene classification and ortholog identification, but it is relatively labor-intensive and computationally demanding. Here we report and demonstrate a rapid new method for gene classification based on phylogenetic principles. Given the phylogeny of a minimal sample of gene family members, our method automatically identifies amino acids that are phylogenetically characteristic of each class of sequences in the family; it then classifies a novel sequence based on the presence of these characteristic attributes in its sequence. Using a subset of homeobox protein sequences as a test case, we show that our method approximates classification based on full-scale phylogenetic analysis with very high accuracy in a tiny fraction of the time.

[1]  M. Telford Turning Hox “signatures” into synapomorphies , 2000, Evolution & development.

[2]  Peter W. H. Holland,et al.  Ancient origin of the Hox gene cluster , 2001, Nature Reviews Genetics.

[3]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[4]  C Kappen,et al.  Analysis of a complete homeobox gene repertoire: implications for the evolution of diversity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Andreas D. Baxevanis,et al.  The Homeodomain Resource: sequences, structures, DNA binding sites and genomic information , 2001, Nucleic Acids Res..

[6]  K. Bremer,et al.  BRANCH SUPPORT AND TREE STABILITY , 1994 .

[7]  M. Telford,et al.  Identification of planarian homeobox sequences indicates the antiquity of most Hox/homeotic gene subclasses. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[8]  J. Farris The Logical Basis of Phylogenetic Analysis , 2004 .

[9]  R. DeSalle,et al.  Gene family evolution and homology: genomics meets phylogenetics. , 2000, Annual review of genomics and human genetics.

[10]  W. Hennig Phylogenetic Systematics , 2002 .

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  Kevin C. Nixon,et al.  Populations, Genetic Variation, and the Delimitation of Phylogenetic Species , 1992 .

[13]  B. Hartmann,et al.  HOX genes in the sepiolid squid Euprymna scolopes: Implications for the evolution of complex body plans , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  G. Ruvkun,et al.  The taxonomy of developmental control in Caenorhabditis elegans. , 1998, Science.

[15]  D. Duboule Guidebook to the homeobox genes , 1994 .

[16]  Sean B. Carroll,et al.  Hox genes in brachiopods and priapulids and protostome evolution , 1999, Nature.

[17]  R. DeSalle,et al.  Phylogeny of genes for secretion NTPases: Identification of the widespread tadA subfamily and development of a diagnostic key for gene classification , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M. Scott,et al.  Hox genes in evolution: protein surfaces and paralog groups. , 1997, Trends in genetics : TIG.

[19]  Patrick L. Williams,et al.  Finding the Minimal Change in a Given Tree , 1990 .

[20]  Peer Bork,et al.  Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..

[21]  Andreas D. Baxevanis,et al.  The Homeodomain Resource: a prototype database for a large protein family , 2000, Nucleic Acids Res..

[22]  Andreas D. Baxevanis,et al.  The Homeodomain Resource: sequences, structures and genomic information , 1999, Nucleic Acids Res..

[23]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..