Efficient Indexed Alignment of Contigs to Optical Maps

Since its emergence almost 20 years ago (Schwartz et al., Science 1995), optical mapping has undergone a transition from laboratory technique to commercially available data generation method. In line with this transition, it is only relatively recently that optical mapping data has started to be used for scaffolding contigs and assembly validation in large-scale sequencing projects — for example, the goat (Dong et al., Nature Biotech. 2013) and amborella (Chamala et al., Science 2013) genomes. One major hurdle to the wider use of optical mapping data is the efficient alignment of in silico digested contigs to an optical map. We develop Twin to tackle this very problem. Twin is the first index-based method for aligning in silico digested contigs to an optical map. Our results demonstrate that Twin is an order of magnitude faster than competing methods on the largest genome. Most importantly, it is specifically designed to be capable of dealing with large eukaryote genomes and thus is the only non-proprietary method capable of completing the alignment for the budgerigar genome in a reasonable amount of CPU time.

[1]  Juan J de Pablo,et al.  A microfluidic system for large DNA molecule arrays. , 2004, Analytical chemistry.

[2]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[3]  Veli Mäkinen,et al.  Indexing Graphs for Path Queries with Applications in Genome Research , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  David C. Schwartz,et al.  Optical Mapping in Genomic Analysis , 2006 .

[5]  Salvatore Paxia,et al.  Genomics via Optical Mapping IV: Sequence Validation via Optical Map Matching , 2001 .

[6]  Yi Yang,et al.  Alignment of Optical Maps , 2005, RECOMB.

[7]  Miron Livny,et al.  Validation of rice genome sequence by optical mapping , 2007, BMC Genomics.

[8]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[9]  Simon Gog,et al.  Optimized succinct data structures for massive data , 2014, Softw. Pract. Exp..

[10]  Hamidreza Chitsaz,et al.  SEQuel: improving the accuracy of genome assemblies , 2012, Bioinform..

[11]  David C. Schwartz,et al.  AGORA: Assembly Guided by Optical Restriction Alignment , 2012, BMC Bioinformatics.

[12]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[13]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[14]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[15]  Bud Mishra,et al.  False Positives in Genomic Map Assembly and Sequence Validation , 2001, WABI.

[16]  D. Schwartz,et al.  Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data , 2013, Rice.

[17]  Steven Salzberg,et al.  Beware of mis-assembled genomes , 2005, Bioinform..

[18]  James R. Knight,et al.  High-coverage sequencing and annotated assemblies of the budgerigar genome , 2014, GigaScience.

[19]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[20]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[21]  Rod A Wing,et al.  Assembly and Validation of the Genome of the Nonmodel Basal Angiosperm Amborella , 2013, Science.

[22]  Qian Qian,et al.  Proteomic analysis of a disease-resistance-enhanced lesion mimic mutant spotted leaf 5 in rice , 2013, Rice.

[23]  David C. Schwartz,et al.  Whole-Genome Shotgun Optical Mapping of Rhodospirillum rubrum , 2004, Applied and Environmental Microbiology.

[24]  David C. Schwartz,et al.  High-resolution human genome structure by single-molecule analysis , 2010, Proceedings of the National Academy of Sciences.

[25]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[26]  Sergey Koren,et al.  Aggressive assembly of pyrosequencing reads with mates , 2008, Bioinform..

[27]  Mihai Pop,et al.  Scaffolding and validation of bacterial genome assemblies using optical restriction maps , 2008, Bioinform..

[28]  J. Hofkens,et al.  Optical mapping of DNA: Single‐molecule‐based methods for mapping genomes , 2011, Biopolymers.

[29]  Gonzalo Navarro,et al.  New algorithms on wavelet trees and applications to information retrieval , 2010, Theor. Comput. Sci..

[30]  E. Dimalanta,et al.  A Whole-Genome Shotgun Optical Map of Yersinia pestis Strain KIM , 2002, Applied and Environmental Microbiology.

[31]  Jessica Severin,et al.  Shotgun optical mapping of the entire Leishmania major Friedlin genome. , 2004, Molecular and biochemical parasitology.

[32]  David C. Schwartz,et al.  A Single Molecule Scaffold for the Maize Genome , 2009, PLoS genetics.

[33]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[34]  Daijin Ko,et al.  Enriching for correct prediction of biological processes using a combination of diverse classifiers , 2011, BMC Bioinformatics.

[35]  Deacon J. Sweeney,et al.  Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus) , 2012, Nature Biotechnology.

[36]  T. Anantharaman,et al.  A probabilistic analysis of false positives in optical map alignment and validation , 2001 .

[37]  Meng He,et al.  Indexing Compressed Text , 2003 .

[38]  S. Salzberg,et al.  Sequencing and Assembly of the 22-Gb Loblolly Pine Genome , 2014, Genetics.