Multi-genome Scaffold Co-assembly Based on the Analysis of Gene Orders and Genomic Repeats

Advances in the DNA sequencing technology over the past decades have increased the volume of raw sequenced genomic data available for further assembly and analysis. While there exist many software tools for assembly of sequenced genomic material, they often experience difficulties with reconstructing complete chromosomes. Major obstacles include uneven read coverage and long similar subsequences (repeats) in genomes. Assemblers therefore often are able to reliably reconstruct only long subsequences, called scaffolds.

[1]  Shuai Jiang,et al.  Reconstruction of ancestral genomes in presence of gene gain and loss , 2016, bioRxiv.

[2]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[3]  Arek Kasprzyk,et al.  BioMart: driving a paradigm change in biological data management , 2011, Database J. Biol. Databases Curation.

[4]  L. Feuk,et al.  Structural variation in the human genome , 2006, Nature Reviews Genetics.

[5]  Gautier Koscielny,et al.  VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics , 2011, Nucleic Acids Res..

[6]  Feilong Deng,et al.  dbHT-Trans: An Efficient Tool for Filtering the Protein-Encoding Transcripts Assembled by RNA-Seq According to Search for Homologous Proteins , 2016, J. Comput. Biol..

[7]  James E. Allen,et al.  Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes , 2014, Science.

[8]  Annie Chateau,et al.  Ancestral gene synteny reconstruction improves extant species scaffolding , 2015, bioRxiv.

[9]  M. Berriman,et al.  A comprehensive evaluation of assembly scaffolding tools , 2014, Genome Biology.

[10]  P. Pevzner,et al.  Breakpoint graphs and ancestral genome reconstructions. , 2009, Genome research.

[11]  Max A. Alekseyev,et al.  Scaffold assembly based on genome rearrangement analysis , 2015, Comput. Biol. Chem..

[12]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.