Technology dictates algorithms: recent developments in read alignment

Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today’s diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.

[1]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[6]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[7]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  Esko Ukkonen,et al.  Approximate String-Matching over Suffix Trees , 1993, CPM.

[10]  Uzi Vishkin,et al.  Efficient approximate and dynamic matching of patterns using a labeling paradigm , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[11]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[12]  Richard Cole,et al.  Approximate string matching: a simpler faster algorithm , 2002, SODA '98.

[13]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[14]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[15]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[16]  Cristian S. Calude,et al.  Additive Distances and Quasi-Distances Between Words , 2002, J. Univers. Comput. Sci..

[17]  J. Weissenbach Human genome project: past, present, future. , 2002, Ernst Schering Research Foundation workshop.

[18]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[19]  Juha Kärkkäinen,et al.  Better Filtering with Gapped q-Grams , 2001, Fundam. Informaticae.

[20]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[21]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[22]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[23]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[24]  Michael Roberts,et al.  Reducing storage requirements for biological sequence comparison , 2004, Bioinform..

[25]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[26]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[27]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[28]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[29]  Thomas Lengauer,et al.  Data and text mining Computational methods for the design of effective therapies against drug resistant HIV strains , 2005 .

[30]  Philip Hugenholtz,et al.  NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes , 2006, Nucleic Acids Res..

[31]  Michael Q. Zhang,et al.  Using quality scores and longer reads improves accuracy of Solexa read mapping , 2008, BMC Bioinformatics.

[32]  M. Schatz,et al.  Genome assembly forensics: finding the elusive mis-assembly , 2008, Genome Biology.

[33]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[34]  S. Nelson,et al.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning , 2008, Nature.

[35]  Joshua M. Korn,et al.  Association between microdeletion and microduplication at 16p11.2 and autism. , 2008, The New England journal of medicine.

[36]  Brian D. Ondov,et al.  Efficient mapping of Applied Biosystems SOLiD sequence data to a reference genome for functional genomic applications , 2008, Bioinform..

[37]  Gunnar Rätsch,et al.  Optimal spliced alignments of short sequence reads , 2008, BMC Bioinformatics.

[38]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[39]  Gunnar Rätsch,et al.  Optimal spliced alignments of short sequence reads , 2008, ECCB.

[40]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[41]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[42]  Siu-Ming Yiu,et al.  Compressed indexing and local alignment of DNA , 2008, Bioinform..

[43]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[44]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[45]  Bin Ma,et al.  ZOOM! Zillions of oligos mapped , 2008, Bioinform..

[46]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[47]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[48]  Paul Flicek,et al.  Sense from sequence reads: methods for alignment and assembly , 2009, Nature Methods.

[49]  Mihai Pop,et al.  Inexact Local Alignment Search over Suffix Arrays , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[50]  N. Warthmann,et al.  Simultaneous alignment of short reads against multiple genomes , 2009, Genome Biology.

[51]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[52]  Peter F. Stadler,et al.  Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures , 2009, PLoS Comput. Biol..

[53]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[54]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[55]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[56]  Yuan Gao,et al.  MOM: maximum oligonucleotide mapping , 2009, Bioinform..

[57]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[58]  K. Reinert,et al.  RazerS--fast read mapping with sensitivity control. , 2009, Genome research.

[59]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[60]  Steven J. M. Jones,et al.  Slider—maximum use of probability information for alignment of short sequence reads and SNP detection , 2008, Bioinform..

[61]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[62]  Giorgio Valle,et al.  PASS: a program to align short sequences , 2009, Bioinform..

[63]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[64]  Pao-Yang Chen,et al.  BS Seeker: precise mapping for bisulfite sequencing , 2010, BMC Bioinformatics.

[65]  Ting Chen,et al.  PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds , 2009, Bioinform..

[66]  Steven J. M. Jones,et al.  High quality SNP calling using Illumina data at shallow coverage , 2010, Bioinform..

[67]  B. Langmead,et al.  Aligning Short Sequencing Reads with Bowtie , 2010, Current protocols in bioinformatics.

[68]  Michael Q. Zhang,et al.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications , 2010, Nature Biotechnology.

[69]  Knut Reinert,et al.  MicroRazerS: rapid alignment of small RNA reads , 2010, Bioinform..

[70]  Ion I. Mandoiu,et al.  Estimation of alternative splicing isoform frequencies from RNA-Seq data , 2010, Algorithms for Molecular Biology.

[71]  Zemin Ning,et al.  SMALT – A new mapper for DNA sequencing reads , 2010 .

[72]  Rob Knight,et al.  PyNAST: a flexible tool for aligning sequences to a template alignment , 2009, Bioinform..

[73]  Dominique Lavenier,et al.  GASSST: global alignment short sequence search tool , 2010, Bioinform..

[74]  Vipin T. Sreedharan,et al.  RNA‐Seq Read Alignments with PALMapper , 2010, Current protocols in bioinformatics.

[75]  Mark J. Clement,et al.  The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing , 2010, Bioinform..

[76]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[77]  Faraz Hach,et al.  mrsFAST: a cache-oblivious algorithm for short-read mapping , 2010, Nature Methods.

[78]  Weng-Keen Wong,et al.  Gene expression Advance Access publication April 21, 2010 Supersplat—spliced RNA-seq alignment , 2009 .

[79]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[80]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[81]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[82]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[83]  R. Sanjuán,et al.  Viral Mutation Rates , 2010, Journal of Virology.

[84]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[85]  Martin Kircher,et al.  CORRESPONDENCE Open Access , 2022 .

[86]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[87]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[88]  Jignesh M. Patel,et al.  WHAM: A High-Throughput Sequence Alignment Method , 2011, TODS.

[89]  Sean M. Grimmond,et al.  X-MATE: a flexible system for mapping short read data , 2011, Bioinform..

[90]  Costas S. Iliopoulos,et al.  DynMap: mapping short reads to multiple related genomes , 2011, BCB '11.

[91]  Siu-Ming Yiu,et al.  SOAPsplice: Genome-Wide ab initio Detection of Splice Junctions from RNA-Seq Data , 2011, Front. Gene..

[92]  Rob Knight,et al.  Using QIIME to Analyze 16S rRNA Gene Sequences from Microbial Communities , 2011, Current protocols in bioinformatics.

[93]  Nicholas Eriksson,et al.  ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data , 2011, BMC Bioinformatics.

[94]  Stefan R. Henz,et al.  Reference-guided assembly of four diverse Arabidopsis thaliana genomes , 2011, Proceedings of the National Academy of Sciences.

[95]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[96]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[97]  Ümit V. Çatalyürek,et al.  Benchmarking short sequence mapping tools , 2013, BMC Bioinformatics.

[98]  Richard M. Karp,et al.  Faster and More Accurate Sequence Alignment with SNAP , 2011, ArXiv.

[99]  Wei-keng Liao,et al.  Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing , 2011, Bioinform..

[100]  Wing-Kin Sung,et al.  BatMis: a fast algorithm for k-mismatch mapping , 2012, Bioinform..

[101]  Wing Hung Wong,et al.  Fast and accurate read alignment for resequencing , 2012, Bioinform..

[102]  Jiang Li,et al.  The effect of strand bias in Illumina short-read sequencing data , 2012, BMC Genomics.

[103]  Elizabeth M. Ryan,et al.  De novo assembly of highly diverse viral populations , 2012, BMC Genomics.

[104]  Véronique Martin,et al.  Mapping Reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis , 2012, J. Comput. Biol..

[105]  Nuno A. Fonseca,et al.  Tools for mapping high-throughput sequencing data , 2012, Bioinform..

[106]  Stefano Lonardi,et al.  BRAT-BW: efficient and accurate mapping of bisulfite-treated reads , 2012, Bioinform..

[107]  Ira M. Hall,et al.  YAHA: fast and flexible long-read alignment with optimal breakpoint detection , 2012, Bioinform..

[108]  Peter H. Sudmant,et al.  Evolution of Human-Specific Neural SRGAP2 Genes by Incomplete Segmental Duplication , 2012, Cell.

[109]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[110]  E. Domingo,et al.  Viral Quasispecies Evolution , 2012, Microbiology and Molecular Reviews.

[111]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[112]  Roderic Guigó,et al.  The GEM mapper: fast, accurate and versatile alignment by filtration , 2012, Nature Methods.

[113]  Yongchao Liu,et al.  Long read alignment based on maximal exact match seeds , 2012, Bioinform..

[114]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[115]  Pavel Skums,et al.  Efficient error correction for next-generation sequencing of viral amplicons , 2012, BMC Bioinformatics.

[116]  Yongan Zhao,et al.  RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data , 2011, Bioinform..

[117]  Eric Rivals,et al.  CRAC: an integrated approach to the analysis of RNA-seq reads , 2013, Genome Biology.

[118]  Nagesh V. Honnalli,et al.  Hobbes: optimized gram-based methods for efficient read alignment , 2011, Nucleic acids research.

[119]  Jun Hu,et al.  OSA: a fast and accurate alignment tool for RNA-Seq , 2012, Bioinform..

[120]  Martin Vingron,et al.  Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS , 2012, Bioinform..

[121]  Knut Reinert,et al.  RazerS 3: Faster, fully sensitive read mapping , 2012, Bioinform..

[122]  Kai Ye,et al.  PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data , 2012, Bioinform..

[123]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[124]  Chung F. Wong,et al.  SRmapper: a fast and sensitive genome-hashing alignment tool , 2013, Bioinform..

[125]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[126]  J. Harrow,et al.  Systematic evaluation of spliced alignment programs for RNA-seq data , 2013, Nature Methods.

[127]  Michael Q. Zhang,et al.  BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data , 2013, BMC Genomics.

[128]  Arndt von Haeseler,et al.  NextGenMap: fast and accurate read mapping in highly polymorphic genomes , 2013, Bioinform..

[129]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[130]  Onur Mutlu,et al.  Accelerating read mapping with FastHASH , 2013, BMC Genomics.

[131]  Giovanni Manzini,et al.  Better spaced seeds using Quadratic Residues , 2013, J. Comput. Syst. Sci..

[132]  Xiaohui Xie,et al.  Improving read mapping using additional prefix grams , 2014, BMC Bioinformatics.

[133]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[134]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[135]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[136]  Sahar Al Seesi,et al.  Transcriptome assembly and quantification from Ion Torrent RNA-Seq data , 2013, BMC Genomics.

[137]  Volker Roth,et al.  Probabilistic Inference of Viral Quasispecies Subject to Recombination , 2013, J. Comput. Biol..

[138]  M. Pop,et al.  Sequence assembly demystified , 2013, Nature Reviews Genetics.

[139]  Ion I. Mandoiu,et al.  Reconstruction of viral population structure from next-generation sequencing data using multicommodity flows , 2013, BMC Bioinformatics.

[140]  Xiao Yang,et al.  V-Phaser 2: variant inference for viral populations , 2013, BMC Genomics.

[141]  M. Berriman,et al.  REAPR: a universal tool for genome assembly evaluation , 2013, Genome Biology.

[142]  Knut Reinert,et al.  Fast and accurate read mapping with approximate seeds and multiple backtracking , 2012, Nucleic acids research.

[143]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[144]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[145]  Anqi Wang,et al.  SEME: A Fast Mapper of Illumina Sequencing Reads with Statistical Evaluation , 2013, RECOMB.

[146]  Steven J. M. Jones,et al.  JAGuaR: Junction Alignments to Genome for RNA-Seq Reads , 2014, PloS one.

[147]  Yongchao Liu,et al.  CUSHAW3: Sensitive and Accurate Base-Space and Color-Space Short-Read Alignment with Hybrid Seeding , 2014, PloS one.

[148]  Anders Krogh,et al.  Adaptable probabilistic mapping of short reads using position specific scoring matrices , 2014, BMC Bioinformatics.

[149]  Faraz Hach,et al.  mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications , 2014, Nucleic Acids Res..

[150]  Volker Roth,et al.  HIV Haplotype Inference Using a Propagating Dirichlet Process Mixture Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[151]  Eleazar Eskin,et al.  Accurate viral population assembly from ultra-deep sequencing data , 2014, Bioinform..

[152]  T. Sharpton An introduction to the analysis of shotgun metagenomic data , 2014, Front. Plant Sci..

[153]  Gabor T. Marth,et al.  MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping , 2013, PloS one.

[154]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[155]  M. Emond,et al.  Accuracy of Next Generation Sequencing Platforms. , 2014, Next generation, sequencing & applications.

[156]  Veli Mäkinen,et al.  Indexing Graphs for Path Queries with Applications in Genome Research , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[157]  Ignacio Blanquer,et al.  Acceleration of short and long DNA read mapping without loss of accuracy using suffix array , 2014, Bioinform..

[158]  J. Wolf,et al.  A field guide to whole-genome sequencing, assembly and annotation , 2014, Evolutionary applications.

[159]  Justin Chu,et al.  DIDA: Distributed Indexing Dispatched Alignment , 2015, PloS one.

[160]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[161]  Pavel Skums,et al.  Antigenic cooperation among intrahost HCV variants organized into a complex network of cross-immunoreactivity , 2015, Proceedings of the National Academy of Sciences.

[162]  Mikhail Shugay,et al.  MiXCR: software for comprehensive adaptive immunity profiling , 2015, Nature Methods.

[163]  S. Lonardi,et al.  CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers , 2015, BMC Genomics.

[164]  Piotr Indyk,et al.  Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) , 2014, STOC.

[165]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[166]  Sara Goodwin,et al.  Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome , 2015, bioRxiv.

[167]  Alexander Schönhuth,et al.  Characteristics of de novo structural changes in the human genome , 2015, Genome research.

[168]  Thomas Bonfert,et al.  ContextMap 2: fast and accurate context-based RNA-seq mapping , 2015, BMC Bioinformatics.

[169]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[170]  Jeroen Aerssens,et al.  VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering , 2015, Bioinform..

[171]  Yun Xu,et al.  BitMapper: an efficient all-mapper based on bit-vector computing , 2015, BMC Bioinformatics.

[172]  Christina Boucher,et al.  Misassembly detection using paired-end sequence reads and optical mapping data , 2014, Bioinform..

[173]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[174]  Yadong Wang,et al.  rHAT: fast alignment of noisy long reads with regional hashing , 2016, Bioinform..

[175]  Onur Mutlu,et al.  Optimal seed solver: optimizing seed selection in read mapping , 2015, Bioinform..

[176]  Can Alkan,et al.  On genomic repeats and reproducibility , 2016, Bioinform..

[177]  Niranjan Nagarajan,et al.  Fast and sensitive mapping of nanopore sequencing reads with GraphMap , 2016, Nature Communications.

[178]  Robert Gentleman,et al.  Prediction and Quantification of Splice Events from RNA-Seq Data , 2016, PloS one.

[179]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[180]  Steven Skiena,et al.  NanoBLASTer: Fast alignment and characterization of Oxford Nanopore single molecule sequencing reads , 2016, 2016 IEEE 6th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS).

[181]  Alberto Policriti,et al.  Fast, accurate, and lightweight analysis of BS-treated reads with ERNE 2 , 2016, BMC Bioinformatics.

[182]  Aaron Y. Lee,et al.  Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations , 2016, BMC Bioinformatics.

[183]  Alexey I. Nesvizhskii,et al.  Two-pass alignment improves novel splice junction quantification , 2015, Bioinform..

[184]  E. S. Quintana-Ortí,et al.  Highly sensitive and ultrafast read mapping for RNA-seq analysis , 2016, DNA research : an international journal for rapid publication of reports on genes and genomes.

[185]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[186]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[187]  Bonnie Berger,et al.  Compressive mapping for next-generation sequencing , 2016, Nature Biotechnology.

[188]  Onur Mutlu,et al.  GateKeeper: a new hardware architecture for accelerating pre‐alignment in DNA short read mapping , 2016, Bioinform..

[189]  Steven L Salzberg,et al.  The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum , 2017, bioRxiv.

[190]  Haris Vikalo,et al.  aBayesQR: A Bayesian Method for Reconstruction of Viral Populations Characterized by Low Diversity , 2017, RECOMB.

[191]  S. Koren,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, bioRxiv.

[192]  Philip D. Blood,et al.  Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software , 2017, Nature Methods.

[193]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[194]  Kentaro K. Shimizu,et al.  Reference-guided de novo assembly approach improves genome reconstruction for related species , 2017, BMC Bioinformatics.

[195]  R. Durbin,et al.  Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly , 2016, bioRxiv.

[196]  Eun Ji Kim,et al.  Simulation-based comprehensive benchmarking of RNA-seq aligners , 2016, Nature Methods.

[197]  Christos A. Ouzounis,et al.  Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment , 2017, Biosyst..

[198]  Piotr Wojtek Dabrowski,et al.  HiLive: real‐time mapping of illumina reads while sequencing , 2016, Bioinform..

[199]  Wen-Lian Hsu,et al.  Kart: a divide-and-conquer algorithm for NGS read alignment , 2017, Bioinform..

[200]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[201]  Deanna M. Church,et al.  Building and Improving Reference Genome Assemblies , 2017, Proceedings of the IEEE.

[202]  Eleazar Eskin,et al.  Long single-molecule reads can resolve the complexity of the Influenza virus composed of rare, closely related mutant variants , 2016, bioRxiv.

[203]  Michael Huber,et al.  MinVar: A rapid and versatile tool for HIV-1 drug resistance genotyping by deep sequencing. , 2017, Journal of virological methods.

[204]  C. Alkan,et al.  MAGNET: Understanding and Improving the Accuracy of Genome Pre-Alignment Filtering , 2017, 1707.01631.

[205]  Shanrong Zhao,et al.  Evaluation and comparison of computational tools for RNA-seq isoform quantification , 2017, BMC Genomics.

[206]  Yadong Wang,et al.  LAMSA: fast split read alignment with long approximate matches , 2017, Bioinform..

[207]  Steven Salzberg,et al.  Short Read Mapping: An Algorithmic Tour , 2017, Proceedings of the IEEE.

[208]  David A. Eccles,et al.  De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads , 2018, BMC Biology.

[209]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[210]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[211]  Alicia Oshlack,et al.  Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis , 2017, bioRxiv.

[212]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[213]  A. Battle,et al.  False positives in trans-eQTL and co-expression analyses arising from RNA-sequencing alignment errors , 2018, F1000Research.

[214]  Andreas Andrusch,et al.  DREAM‐Yara: an exact read mapper for very large databases with short update time , 2018, Bioinform..

[215]  Guoliang Li,et al.  An integrated package for bisulfite DNA methylation data analysis with Indel-sensitive mapping , 2018, BMC Bioinformatics.

[216]  Jean Thierry-Mieg,et al.  Magic-BLAST, an accurate DNA and RNA-seq aligner for long and short reads , 2018, bioRxiv.

[217]  Fritz J Sedlazeck,et al.  Piercing the dark matter: bioinformatics of long-range sequencing and mapping , 2018, Nature Reviews Genetics.

[218]  Wen-Lian Hsu,et al.  DART: a fast and accurate RNA-seq mapper with a partitioning strategy , 2017, Bioinform..

[219]  Yan Lu,et al.  A comprehensive evaluation of alignment software for reduced representation bisulfite sequencing data , 2018, Bioinform..

[220]  Justin Chu,et al.  Tigmint: correcting assembly errors using linked reads from large molecules , 2018, BMC Bioinformatics.

[221]  Adam M. Phillippy,et al.  MUMmer4: A fast and versatile genome alignment system , 2018, PLoS Comput. Biol..

[222]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[223]  William Jones,et al.  Variation graph toolkit improves read mapping by representing genetic variation in the reference , 2018, Nature Biotechnology.

[224]  C. Alkan,et al.  Hercules: a profile HMM-based hybrid error correction algorithm for long reads , 2017, bioRxiv.

[225]  Harianto Tjong,et al.  Picky Comprehensively Detects High Resolution Structural Variants in Nanopore Long Reads , 2018, Nature Methods.

[226]  Srinivas Aluru,et al.  A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases , 2017, bioRxiv.

[227]  Onur Mutlu,et al.  GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies , 2017, BMC Genomics.

[228]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[229]  Pavel Skums,et al.  Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction , 2018, bioRxiv.

[230]  Tam P. Sneddon,et al.  Long-read genome sequencing identifies causal structural variation in a Mendelian disease , 2017, Genetics in Medicine.

[231]  Evan E. Eichler,et al.  Long-read sequence and assembly of segmental duplications , 2018, Nature Methods.

[232]  Weiguo Liu,et al.  Fast and efficient short read mapping based on a succinct hash index , 2018, BMC Bioinformatics.

[233]  Min Zhao,et al.  The bioinformatics tools for the genome assembly and analysis based on third-generation sequencing. , 2018, Briefings in functional genomics.

[234]  Steven L Salzberg,et al.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype , 2019, Nature Biotechnology.

[235]  Bertil Schmidt,et al.  BGSA: a bit-parallel global sequence alignment toolkit for multi-core and many-core architectures , 2018, Bioinform..

[236]  Sanghamitra Bandyopadhyay,et al.  conLSH: Context based Locality Sensitive Hashing for Mapping of noisy SMRT Reads , 2019, bioRxiv.

[237]  Jinyan Li,et al.  Index suffix-prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression , 2019, Bioinform..

[238]  Alexander Payne,et al.  BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files , 2018, Bioinform..

[239]  Onur Mutlu,et al.  Shouji: a fast and efficient pre-alignment filter for sequence alignment , 2018, Bioinform..

[240]  Faraz Hach,et al.  lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data , 2018, Bioinform..

[241]  Sergey Koren,et al.  Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome , 2019, Nature Biotechnology.

[242]  Mile Šikić,et al.  Graphmap2 - splice-aware RNA-seq mapper for long reads , 2019, bioRxiv.

[243]  B. Singer,et al.  A Practical Guide to the Measurement and Analysis of DNA Methylation. , 2019, American journal of respiratory cell and molecular biology.

[244]  Rajeev Balasubramonian,et al.  GenCache: Leveraging In-Cache Operators for Efficient Sequence Alignment , 2019, MICRO.

[245]  Bo Liu,et al.  deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index , 2019, Genome Biology.

[246]  Veli Mäkinen,et al.  Bit-parallel sequence-to-graph alignment , 2019, Bioinform..

[247]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[248]  Mick Watson,et al.  Errors in long-read assemblies can critically affect protein prediction , 2019, Nature Biotechnology.

[249]  Srinivas Aluru,et al.  Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[250]  William J. Dally,et al.  Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[251]  Jean Thierry-Mieg,et al.  Magic-BLAST, an accurate RNA-seq aligner for long and short reads , 2019, BMC Bioinformatics.

[252]  Fatemeh Almodaresi,et al.  Alignment and mapping methodology influence transcript abundance estimation , 2020, Genome biology.

[253]  Sanghamitra Bandyopadhyay,et al.  conLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads , 2020, Comput. Biol. Chem..

[254]  Telescope: an interactive tool for managing large-scale analysis from mobile devices , 2019, GigaScience.

[255]  Chaining with overlaps revisited , 2020, CPM.

[256]  Onur Mutlu,et al.  Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm , 2019, Bioinform..

[257]  Rachata Ausavarungnirun,et al.  GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[258]  Onur Mutlu,et al.  Accelerating Genome Analysis: A Primer on an Ongoing Journey , 2020, IEEE Micro.

[259]  C. Alkan,et al.  SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs , 2019, Bioinform..