GapsMis: flexible sequence alignment with a bounded number of gaps

Motivation: Recent developments in next-generation sequencing technologies have renewed interest in pairwise sequence alignment techniques, particularly so for the application of re-sequencing---the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and the high-quality fragment of a short read, an important problem is to find the best possible alignment between a succeeding factor of the reference sequence and the remaining low-quality part of the read; allowing a number of mismatches and the insertion of gaps in the alignment. Results: We present GapsMis, a tool for pairwise global and semi-global sequence alignment with a variable, but bounded, number of gaps. It is based on a new algorithm, which computes a different version of the traditional dynamic programming matrix. Millions of pairwise sequence alignments, performed under realistic conditions based on the properties of real full-length genomes, show that GapsMis can increase the accuracy of extending short-read alignments end-to-end compared to more traditional approaches. Availability: http://www.exelixis-lab.org/gapmis

[1]  Solon P. Pissis,et al.  Libgapmis: An ultrafast library for short-read single-gap alignment , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[2]  Costas S. Iliopoulos,et al.  Approximate string-matching with a single gap for sequence alignment , 2011, BCB '11.

[3]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[4]  S. Robertson,et al.  Mutations in NOTCH2 cause Hajdu-Cheney syndrome, a disorder of severe and progressive bone loss , 2011, Nature Genetics.

[5]  Sahar Mansour,et al.  Rapid identification of mutations in GJC2 in primary lymphoedema using whole exome sequencing combined with linkage analysis with delineation of the phenotype , 2011, Journal of Medical Genetics.

[6]  M. Crochemore,et al.  Algorithms on Strings: Tools , 2007 .

[7]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[8]  Costas S. Iliopoulos,et al.  REAL: an efficient REad ALigner for next generation sequencing reads , 2010, BCB '10.

[9]  Tomás Flouri,et al.  GapMis: a tool for pairwise sequence alignment with a single gap. , 2013, Recent patents on DNA & gene sequences.

[10]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[11]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[12]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[13]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[14]  田中 俊典 National Center for Biotechnology Information (NCBI) , 2012 .

[15]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[16]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[17]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[18]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[19]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[20]  Armin R. Mikler,et al.  Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, Niagara Falls, NY, USA, August 2-4, 2010 , 2010, BCB.

[21]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[22]  Solon P. Pissis,et al.  libgapmis: extending short-read alignments , 2013, BMC Bioinformatics.

[23]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[24]  Joong Chae Na,et al.  Alignment of biological sequences with quality scores , 2009, Int. J. Bioinform. Res. Appl..

[25]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[26]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.