Getting Started in Structural Phylogenomics

Structural phylogenomics refers to the combined use of evolutionary and structural information in a bioinformatics analysis. The term phylogenomics refers to two distinct tasks: reconstructing a species phylogeny using multiple genes (for a review, see [1]) and predicting protein function by estimating the evolutionary history of a family of related sequences (i.e., a gene tree or multi-gene tree including gene duplication events) [2]–[4]. In this “Getting Started” article, we focus on the latter task, restricting our discussion to the construction and analysis of phylogenetic trees for amino acid data, and including protein structure data and structure prediction to improve the accuracy of functional annotation. We address the following questions: Why perform a complicated structural phylogenomic analysis when simpler approaches are available? What are the fundamental underlying assumptions of this approach, and what are the implications of any conflicts with these assumptions? What technical challenges do we need to address to achieve the full potential of these ideas?

[1]  Sean R. Eddy,et al.  RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs , 2002, BMC Bioinformatics.

[2]  Kimmen Sjölander,et al.  INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification , 2008, Bioinform..

[3]  P. Karp,et al.  Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers , 2005, Nucleic acids research.

[4]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[5]  Michael Y. Galperin,et al.  Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement, and operon disruption , 1998, Silico Biol..

[6]  Derrick J. Zwickl,et al.  Increased taxon sampling greatly reduces phylogenetic error. , 2002, Systematic biology.

[7]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[8]  Nir Ben-Tal,et al.  The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures , 2008, Nucleic Acids Res..

[9]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[10]  Bin Qian,et al.  Detecting distant homologs using phylogenetic tree‐based HMMs , 2003, Proteins.

[11]  Ute Baumann,et al.  Estimating the annotation error rate of curated GO database sequence annotations , 2007, BMC Bioinformatics.

[12]  Kimmen Sjölander,et al.  SATCHMO: Sequence Alignment and Tree Construction Using Hidden Markov Models , 2003, Bioinform..

[13]  K. Sjölander,et al.  PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification , 2006, Genome Biology.

[14]  Kimmen Sjölander,et al.  Phylogenomic inference of protein molecular function: advances and challenges , 2004, Bioinform..

[15]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[16]  David C. Jones,et al.  Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. , 1996, Journal of molecular biology.

[17]  Philip E. Bourne,et al.  Structural Evolution of the Protein Kinase–Like Superfamily , 2005, PLoS Comput. Biol..

[18]  K. Sjölander,et al.  FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function , 2007, BMC evolutionary biology.

[19]  Duncan P. Brown,et al.  Functional Classification Using Phylogenomic Inference , 2006, PLoS Comput. Biol..

[20]  F. Delsuc,et al.  Phylogenomics and the reconstruction of the tree of life , 2005, Nature Reviews Genetics.