An introduction to reconstructing ancestral genomes

Recent advances in high-throughput genomics technologies have resulted in the sequencing of large numbers of (near) complete genomes. These genome sequences are being mined for important functional elements, such as genes. They are also being compared and contrasted in order to identify other functional sequences, such as those involved in the regulation of genes. In cases where DNA sequences from different organisms can be determined to have originated from a common ancestor, it is natural to try to infer the an- cestral sequences. The reconstruction of ancestral genomes can lead to insights about genome evolution, and the origins and diversity of function. There are a number of interesting foundational questions associated with reconstructing ancestral genomes: Which statistical models for evolution should be used for making inferences about ancestral sequences? How should extant genomes be compared in order to facilitate ancestral reconstruction? Which portions of ancestral genomes can be reconstructed reliably, and what are the limits of ancestral reconstruction? We discuss recent progress on some of these questions, offer some of our own opinions, and highlight interesting mathematics, statistics, and computer science problems.

[1]  T. Pupko,et al.  Probabilistic models and their impact on the accuracy of reconstructed ancestral protein sequences , 2007 .

[2]  D. Nickle,et al.  Sources of variation in ancestral sequence reconstruction for HIV-1 envelope genes , 2006, Evolutionary bioinformatics online.

[3]  Tandy J. Warnow,et al.  Reconstructing Chromosomal Evolution , 2006, SIAM J. Comput..

[4]  Lior Pachter,et al.  Parametric Alignment of Drosophila Genomes , 2005, PLoS Comput. Biol..

[5]  L. Pauling,et al.  Molecules as documents of evolutionary history. , 1965, Journal of theoretical biology.

[6]  N. Grishin,et al.  Reconstruction of ancestral protein sequences and its applications , 2004, BMC Evolutionary Biology.

[7]  Tamir Tuller,et al.  Maximum Likelihood of Evolutionary Trees Is Hard , 2005, RECOMB.

[8]  Lior Pachter,et al.  Why Neighbor-Joining Works , 2006, Algorithmica.

[9]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[10]  D. Haussler,et al.  Reconstructing large regions of an ancestral mammalian genome in silico. , 2004, Genome research.

[11]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[12]  Nicoletta Archidiacono,et al.  Ancestral genomes reconstruction: an integrated, multi-disciplinary approach is needed. , 2006, Genome research.

[13]  Chris Smith,et al.  Large-Scale Trends in the Evolution of Gene Structures within 11 Animal Genomes , 2006, PLoS Comput. Biol..

[14]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[15]  Lior Pachter,et al.  The Mathematics of Phylogenomics , 2004, SIAM Rev..

[16]  D. Hampson,et al.  Ancestral reconstruction of the ligand-binding pocket of Family C G protein-coupled receptors , 2006, Proceedings of the National Academy of Sciences.

[17]  Jens Lagergren,et al.  Fast neighbor joining , 2005, Theor. Comput. Sci..

[18]  David Bryant,et al.  On the Uniqueness of the Selection Criterion in Neighbor-Joining , 2005, J. Classif..

[19]  R. Shamir,et al.  A fast algorithm for joint reconstruction of ancestral amino acid sequences. , 2000, Molecular biology and evolution.

[20]  P. Pevzner,et al.  Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. , 2004, Genome research.

[21]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[22]  Sagi Snir,et al.  Algebraic Statistics for Computational Biology: Analysis of Point Mutations in Vertebrate Genomes , 2005 .

[23]  Nebojsa Jojic,et al.  Efficient approximations for learning phylogenetic HMM models from data , 2004, ISMB/ECCB.

[24]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Linus Pauling,et al.  Chemical Paleogenetics. Molecular "Restoration Studies" of Extinct Forms of Life. , 1963 .

[26]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[27]  Lior Pachter,et al.  Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities , 2005, PLoS Comput. Biol..

[28]  J. A. Cavender Taxonomy with confidence , 1978 .

[29]  Nicholas Eriksson,et al.  Phylogenetic Algebraic Geometry , 2004, math/0407033.

[30]  P. Pevzner,et al.  The convergence of cytogenetics and rearrangement-based models for ancestral genome reconstruction. , 2006, Genome research.

[31]  Paul D. Williams,et al.  Assessing the Accuracy of Ancestral Protein Reconstruction Methods , 2006, PLoS Comput. Biol..

[32]  Alan M. Moses,et al.  Whole Genome Phylogeny of the Drosophila melanogaster Species Subgroup: Widespread Discordance with Species Tree & Evidence for Incomplete Lineage Sorting , 2005 .

[33]  O. Gascuel,et al.  Neighbor-joining revealed. , 2006, Molecular biology and evolution.

[34]  Lior Pachter,et al.  Multiple-sequence functional annotation and the generalized hidden Markov phylogeny , 2004, Bioinform..

[35]  J. Felsenstein,et al.  EVOLUTIONARY TREES FROM GENE FREQUENCIES AND QUANTITATIVE CHARACTERS: FINDING MAXIMUM LIKELIHOOD ESTIMATES , 1981, Evolution; international journal of organic evolution.

[36]  Bernard B. Suh,et al.  Reconstructing contiguous regions of an ancestral genome. , 2006, Genome research.

[37]  L. Pachter,et al.  Algebraic Statistics for Computational Biology: Preface , 2005 .

[38]  Wing-Kin Sung,et al.  Constructing a Smallest Refining Galled Phylogenetic Network , 2005, RECOMB.

[39]  Lusheng Wang,et al.  Improved Approximation Algorithms for Tree Alignment , 1996, J. Algorithms.

[40]  Elchanan Mossel,et al.  Maximal Accurate Forests from Distance Matrices , 2006, RECOMB.

[41]  David Haussler,et al.  Phylogenetic Hidden Markov Models , 2005 .

[42]  Lior Pachter,et al.  Identification of evolutionary hotspots in the rodent genomes. , 2004, Genome research.

[43]  David Crews,et al.  Resurrecting the Ancestral Steroid Receptor: Ancient Origin of Estrogen Signaling , 2003, Science.

[44]  Bernd Sturmfels,et al.  Solving the Likelihood Equations , 2005, Found. Comput. Math..

[45]  Colin N. Dewey,et al.  Evolution at the nucleotide level: the problem of multiple whole-genome alignment. , 2006, Human molecular genetics.

[46]  Aleksey Y Ogurtsov,et al.  Indel-based evolutionary distance and mouse-human divergence. , 2004, Genome research.

[47]  David Fernández-Baca,et al.  Parametric Analysis for Ungapped Markov Models of Evolution , 2005, CPM.