Edlib: a C/C++ library for fast, exact sequence alignment using edit distance

We present Edlib, an open-source C/C++ library for exact pairwise sequence alignment using edit distance. We compare Edlib to other libraries and show that it is the fastest while not lacking in functionality, and can also easily handle very large sequences. Being easy to use, flexible, fast and low on memory usage, we expect it to be a cornerstone for many future bioinformatics tools. Source code, installation instructions and test data are freely available for download at https://github.com/Martinsos/edlib, implemented in C/C++ and supported on Linux, MS Windows, and Mac OS. Contact: mile.sikic@fer.hr

[1]  Gad M. Landau,et al.  An efficient string matching algorithm with k differences for nucleotide and amino acid sequences , 2018, Nucleic Acids Res..

[2]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[3]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[4]  Richard M. Karp,et al.  Faster and More Accurate Sequence Alignment with SNAP , 2011, ArXiv.

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  Knut Reinert,et al.  SeqAn An efficient, generic C++ library for sequence analysis , 2008, BMC Bioinformatics.

[7]  Jeff Daily,et al.  Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments , 2016, BMC Bioinformatics.

[8]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[9]  Gabor T. Marth,et al.  SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications , 2012, PloS one.

[10]  Chun-Nan Hsu,et al.  Weakly supervised learning of biomedical information extraction from curated data , 2016, BMC Bioinformatics.

[11]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[12]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .