Ancestral gene synteny reconstruction improves extant species scaffolding

We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes.

[1]  Esko Ukkonen,et al.  Fast scaffolding with small independent mixed integer programs , 2011, Bioinform..

[2]  Jian Ma,et al.  DUPCAR: Reconstructing Contiguous Ancestral Regions with Duplications , 2008, J. Comput. Biol..

[3]  Jens Stoye,et al.  Phylogenetic comparative assembly , 2009, Algorithms for Molecular Biology.

[4]  Yann Ponty,et al.  Assessing the Robustness of Parsimonious Predictions for Gene Neighborhoods from Reconciled Phylogenies: Supplementary Material , 2015, ISBRA.

[5]  Jose Lugo-Martinez,et al.  Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies , 2014, PLoS Comput. Biol..

[6]  Martin Strauch,et al.  Reconstructing Tumor Genome Architectures , 2022 .

[7]  Ján Manuch,et al.  Linearization of ancestral multichromosomal genomes , 2012, BMC Bioinformatics.

[8]  Alessandro Vullo,et al.  Ensembl 2015 , 2014, Nucleic Acids Res..

[9]  Genome evolution aware gene trees , 2015 .

[10]  Cédric Chauve,et al.  Joint Inference of Genome Structure and Content in Heterogeneous Tumor Samples , 2015, RECOMB.

[11]  Jian Ma A probabilistic framework for inferring ancestral genomic orders , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[12]  Sergey Koren,et al.  Bambus 2: scaffolding metagenomes , 2011, Bioinform..

[13]  Nikos Kyrpides,et al.  The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification , 2014, Nucleic Acids Res..

[14]  Wing-Kin Sung,et al.  Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences , 2011, RECOMB.

[15]  M. Berriman,et al.  A comprehensive evaluation of assembly scaffolding tools , 2014, Genome Biology.

[16]  Christopher J. R. Illingworth,et al.  High-Definition Reconstruction of Clonal Composition in Cancer , 2014, Cell reports.

[17]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[18]  Gergely J. Szöllosi,et al.  Evolution of gene neighborhoods within reconciled phylogenies , 2012, Bioinform..

[19]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[20]  Shinya Honda,et al.  Convergent evolution in structural elements of proteins investigated using cross profile analysis , 2012, BMC Bioinformatics.

[21]  Krister M. Swenson,et al.  Phylogenetic Reconstruction from Complete Gene Orders of Whole Genomes , 2008, APBC.

[22]  Loretta Auvil,et al.  Reference-assisted chromosome assembly , 2013, Proceedings of the National Academy of Sciences.

[23]  Cédric Chauve,et al.  A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes , 2008, PLoS Comput. Biol..

[24]  C. Pál,et al.  The evolutionary dynamics of eukaryotic gene order , 2004, Nature Reviews Genetics.

[25]  Yu Lin,et al.  MLGO: phylogeny reconstruction and ancestral inference from gene-order data , 2014, BMC Bioinformatics.

[26]  Pavel A Pevzner,et al.  What is the difference between the breakpoint graph and the de Bruijn graph? , 2014, BMC Genomics.

[27]  Arek Kasprzyk,et al.  BioMart: driving a paradigm change in biological data management , 2011, Database J. Biol. Databases Curation.

[28]  Marcel J. T. Reinders,et al.  GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies , 2012, Bioinform..

[29]  Max A. Alekseyev,et al.  Scaffold assembly based on genome rearrangement analysis , 2015, Comput. Biol. Chem..

[30]  P. Pevzner,et al.  Breakpoint graphs and ancestral genome reconstructions. , 2009, Genome research.

[31]  References , 1971 .

[32]  ergey Aganezovb,et al.  caffold assembly based on genome rearrangement analysis , 2015 .

[33]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[34]  Cedric Chauve,et al.  Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach , 2015, bioRxiv.

[35]  R. Durbin,et al.  Efficient de novo assembly of large genomes using compressed data structures. , 2012, Genome research.

[36]  N. El-Mabrouk,et al.  Efficient gene tree correction guided by species and synteny evolution , 2015 .

[37]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[38]  M. Gouy,et al.  Genome-scale coestimation of species and gene trees , 2013, Genome research.

[39]  Cédric Chauve,et al.  FPSAC: fast phylogenetic scaffolding of ancient contigs , 2013, Bioinform..

[40]  Priscila Biller,et al.  Moments of genome evolution by Double Cut-and-Join , 2015, BMC Bioinformatics.

[42]  David Sankoff,et al.  On the PATHGROUPS approach to rapid small phylogeny , 2011, BMC Bioinformatics.

[43]  Brian J. Raney,et al.  Ragout—a reference-assisted assembly tool for bacterial genomes , 2014, Bioinform..