Advances in stringology and applications : from combinatorics via genomic analysis to computational linguistics

Written text is considered as one of the oldest methods to represent knowledge. A text can be defined as a logical and consistent sequence of symbols which encodes information in a certain language. A straightforward example are natural languages, which are typically used by humans to communicate in spoken or written form. Other underlying examples are DNA, RNA and proteins sequences; DNA and RNA are nucleic acids that carry the genetic instructions, specifies the sequence of the amino acids within proteins, regulate the development and functionality of living organisms specifies the sequence of the amino acids within proteins. Proteins are molecules consisting of one or more chains of amino acids participate in virtually every process within cells. DNA and RNA can be represented as sequences of the nucleo-bases of their nucleotides and proteins and can be represented by the sequence of amino acids encoded in the corresponding gene. A natural problem which emerges when processing such sequences is determine weather a specific patterns occur within another string (known as exact string matching problem); as far as natural language texts are concerned, an important problem in computational linguistics is finding the occurrences of a given word or sentence in a volume of text; Similarly, in computational biology identifying given features in DNA sequences is a important of great significance, on the other side, one is often interested in quantifying the likelihood that two pairs of strings have the same underlying features based on explicit similarity/dissimilarity measurement (known as approximate string matching). Both instance of the string matching problem have been studied thoroughly since early 1960s. This thesis contributes several efficient novel and derived solutions (algorithms and/or data structures), for complex problems which have been originated either out of theoretical considerations or practical problems, and study their experimental performance and compare the proposed solutions with some existing solutions. Among the latter originated introduced solution several ones motivated by realworld problems in the fields of molecular biology and computational linguistics. Despite the fact that studied problems and their proposed solutions differs in research motivation paradigm, yet still utilise similar tools and methodologies for solving the corresponding problems. For example the seminal “Aho-Corasick” Automaton is employed for finding a set of motifs in a biological sequence and detecting spelling mistakes in Arabic text. Similarly, employing the bit-masking trick to extend the DNA symbols to accelerate equivalency testing of degenerate characters in the same way to extend the Arabic alphabet to measure similarity between a stem and derived/inflected forms a given word. To: Israa, Hamza, Nasrallah and Laila.

[1]  Mohammad Sohel Rahman,et al.  Inferring an indeterminate string from a prefix graph , 2015, J. Discrete Algorithms.

[2]  Hisashi Tanaka,et al.  Large DNA palindromes as a common form of structural chromosome aberrations in human cancers , 2011, Human Cell.

[3]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[4]  Gonzalo Navarro,et al.  Storage and Retrieval of Individual Genomes , 2009, RECOMB.

[5]  Naoaki Okazaki,et al.  A Discriminative Candidate Generator for String Transformations , 2008, EMNLP.

[6]  Dany Breslauer,et al.  An On-Line String Superprimitivity Test , 1992, Inf. Process. Lett..

[7]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[8]  M. Lothaire Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications) , 2005 .

[9]  Zvi Galil Real-time algorithms for string-matching and palindrome recognition , 1976, STOC '76.

[10]  William F. Smyth,et al.  Computing Patterns in Strings , 2003 .

[11]  Mário J. Silva,et al.  Spelling Correction for Search Engine Queries , 2004, EsTAL.

[12]  Zsuzsanna Lipták,et al.  On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching , 2010, FUN.

[13]  Trude Heift,et al.  Language Learners and Generic Spell Checkers in CALL , 2013 .

[14]  Zsuzsanna Lipták,et al.  Searching for Jumbled Patterns in Strings , 2009, Stringology.

[15]  Jean Pierre Duval,et al.  Factorizing Words over an Ordered Alphabet , 1983, J. Algorithms.

[16]  Yin Li,et al.  Computing the Cover Array in Linear Time , 2001, Algorithmica.

[17]  Charlotte Truchet,et al.  Computation of words satisfying the "rhythmic oddity property" (after Simha Arom's works) , 2003, Inf. Process. Lett..

[18]  Lalit R. Bahl,et al.  Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition , 1975, IEEE Trans. Inf. Theory.

[19]  Costas S. Iliopoulos,et al.  Generic Algorithms for Factoring Strings , 2013, Information Theory, Combinatorics, and Search Theory.

[20]  Jacques-Olivier Lachaud,et al.  Lyndon + Christoffel = digitally convex , 2009, Pattern Recognit..

[21]  Lucian Ilie,et al.  A Simple Algorithm for Computing the Lempel Ziv Factorization , 2008, Data Compression Conference (dcc 2008).

[22]  Gad M. Landau,et al.  Efficient pattern matching with scaling , 1990, SODA '90.

[23]  Md. Faizul Bari,et al.  Finding All Covers of an Indeterminate String in O(n) Time on Average , 2009, Stringology.

[24]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[25]  Sartaj Sahni,et al.  Handbook Of Data Structures And Applications (Chapman & Hall/Crc Computer and Information Science Series.) , 2004 .

[26]  Heba Afify,et al.  DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database , 2011, ArXiv.

[27]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[28]  Valmir Carneiro Barbosa,et al.  Finding approximate palindromes in strings , 2002, Pattern Recognit..

[29]  Costas S. Iliopoulos,et al.  Enhanced string covering , 2013, Theor. Comput. Sci..

[30]  Shu Wang,et al.  New Perspectives on the Prefix Array , 2008, SPIRE.

[31]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[32]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[33]  Tero Harju,et al.  Combinatorics on Words , 2004 .

[34]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[35]  Miguel A. Martínez-Prieto,et al.  Compressed q-Gram Indexing for Highly Repetitive Biological Sequences , 2010, 2010 IEEE International Conference on BioInformatics and BioEngineering.

[36]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2003, J. Discrete Algorithms.

[37]  Ahmed Ghoneim,et al.  Naive Bayes Classifier based Arabic document categorization , 2010, 2010 The 7th International Conference on Informatics and Systems (INFOS).

[38]  Helen Skaletsky,et al.  Isodicentric Y Chromosomes and Sex Disorders as Byproducts of Homologous Recombination that Maintains Palindromes , 2009, Cell.

[39]  Khaled F. Shaalan,et al.  Arabic GramCheck: a grammar checker for Arabic , 2005, Softw. Pract. Exp..

[40]  Mohamed Ben Ahmed,et al.  Efficient Automatic Correction of Misspelled Arabic Words Based on Contextual Information , 2003, KES.

[41]  Mohammad Sohel Rahman,et al.  Indexing permutations for binary strings , 2010, Inf. Process. Lett..

[42]  Veli Mäkinen,et al.  Indexing Finite Language Representation of Population Genotypes , 2010, WABI.

[43]  Ophir Frieder,et al.  On arabic search: improving the retrieval effectiveness via a light stemming approach , 2002, CIKM '02.

[44]  Philipp Bucher,et al.  Mmsearch: a Motif Arrangement Language and Search Program , 2001, Bioinform..

[45]  Maxime Crochemore,et al.  A note on the Burrows-Wheeler transformation , 2005, ArXiv.

[46]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[47]  Xabier Arregi,et al.  XUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology , 1992, ANLP.

[48]  Gregory Kucherov,et al.  On Maximal Repetitions in Words , 1999, FCT.

[49]  Nicola Vitacolonna,et al.  Structured motifs search , 2004, J. Comput. Biol..

[50]  Shu Wang,et al.  A new approach to the periodicity lemma on strings with holes , 2009, Theoretical Computer Science.

[51]  Claudio Procesi,et al.  The Burnside problem , 1966 .

[52]  Maxime Crochemore,et al.  Fast parallel Lyndon factorization with applications , 1995, Mathematical systems theory.

[53]  R. Ellis,et al.  Entropy, large deviations, and statistical mechanics , 1985 .

[54]  William F. Smyth,et al.  Computing regularities in strings: A survey , 2013, Eur. J. Comb..

[55]  Costas S. Iliopoulos,et al.  A New Approach to Pattern Matching in Degenerate DNA/RNA Sequences and Distributed Pattern Matching , 2008, Math. Comput. Sci..

[56]  William F. Smyth,et al.  String Comparison and Lyndon-Like Factorization Using V-Order in Linear Time , 2011, CPM.

[57]  William F. Smyth,et al.  Combinatorics of Unique Maximal Factorization Families (UMFFs) , 2009, Fundam. Informaticae.

[58]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[59]  William F. Smyth,et al.  A bijective variant of the Burrows-Wheeler Transform using V-order , 2014, Theor. Comput. Sci..

[60]  Amar Mukherjee,et al.  The Burrows-Wheeler Transform:: Data Compression, Suffix Arrays, and Pattern Matching , 2008 .

[61]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[62]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[63]  Ludovic Perret A Chosen Ciphertext Attack on a Public Key Cryptosystem Based on Lyndon Words , 2005, IACR Cryptol. ePrint Arch..

[64]  Shunsuke Inenaga,et al.  Palindrome pattern matching , 2013, Theor. Comput. Sci..

[65]  Costas S. Iliopoulos,et al.  Optimal Superprimitivity Testing for Strings , 1991, Inf. Process. Lett..

[66]  Amy Glen Occurrences of palindromes in characteristic Sturmian words , 2006, Theor. Comput. Sci..

[67]  Maxime Crochemore,et al.  Reverse Engineering Prefix Tables , 2009, STACS.

[68]  Zvi Galil,et al.  Finding all periods and initial palindromes of a string in parallel , 1992, Algorithmica.

[69]  Costas S. Iliopoulos,et al.  An algorithm for mapping short reads to a dynamically changing genomic sequence , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[70]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[71]  Leon Davidson,et al.  Retrieval of misspelled names in an airlines passenger record system , 1962, CACM.

[72]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[73]  Costas S. Iliopoulos,et al.  Finding Patterns with Variable Length Gaps or Don't Cares , 2006, COCOON.

[74]  Eyas El-Qawasmeh Performance Investigation of Bit-Counting Algorithms with a Speedup to Lookup Table , 2000, J. Res. Pract. Inf. Technol..

[75]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[76]  D. E. Daykin,et al.  Ordering Integer Vectors for Coordinate Deletions , 1997 .

[77]  Mohammad Sohel Rahman,et al.  Linear Time Inference of Strings from Cover Arrays Using a Binary Alphabet - (Extended Abstract) , 2012, WALCOM.

[78]  Joseph Gil,et al.  A Bijective String Sorting Transform , 2012, ArXiv.

[79]  Tomás Martínek,et al.  Hardware acceleration of approximate palindromes searching , 2008, 2008 International Conference on Field-Programmable Technology.

[80]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[81]  Thierry Lecroq,et al.  Handbook of Exact String Matching Algorithms , 2004 .

[82]  Kuan-Yu Chen,et al.  Finding All Approximate Gapped Palindromes , 2009, ISAAC.

[83]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[84]  Ayumi Shinohara,et al.  Inferring Strings from Graphs and Arrays , 2003, MFCS.

[85]  Jacqueline W. Daykin,et al.  Lyndon-like and V-order factorizations of strings , 2003, J. Discrete Algorithms.

[86]  William F. Smyth,et al.  A taxonomy of suffix array construction algorithms , 2007, CSUR.

[87]  Denis Maurel,et al.  Direct Construction of Minimal Acyclic Subsequential Transducers , 2000, CIAA.

[88]  Maxime Crochemore,et al.  Two-way string-matching , 1991, JACM.

[89]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[90]  Rickard Domeij,et al.  Detection of Spelling Errors in Swedish Not Using a Word List En Clair , 1994, J. Quant. Linguistics.

[91]  Costas S. Iliopoulos,et al.  Maximal Palindromic Factorization , 2013, Stringology.

[92]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[93]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[94]  Ellen R. Bergeman,et al.  Graph database systems , 1995 .

[95]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[96]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[97]  L. Jorde,et al.  Genetic variation, classification and 'race' , 2004, Nature Genetics.

[98]  R. Lyndon,et al.  Free Differential Calculus, IV. The Quotient Groups of the Lower Central Series , 1958 .

[99]  William F. Smyth,et al.  Algorithms on indeterminate strings , 2003 .

[100]  Gregory Kucherov,et al.  Searching for gapped palindromes , 2008, Theor. Comput. Sci..

[101]  Pascal Mäser,et al.  Species-specific Typing of DNA Based on Palindrome Frequency Patterns , 2011, DNA research : an international journal for rapid publication of reports on genes and genomes.

[102]  William F. Smyth,et al.  Suffix arrays: what are they good for? , 2006, ADC.

[103]  William F. Smyth,et al.  Counting Distinct Strings , 1999, Algorithmica.

[104]  Zainab Abu Bakar,et al.  A rule-based Arabic stemming algorithm , 2011 .

[105]  Esko Ukkonen,et al.  Two Algorithms for Approximate String Matching in Static Texts , 1991, MFCS.

[106]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[107]  William F. Smyth,et al.  An Adaptive Hybrid Pattern-Matching Algorithm on Indeterminate Strings , 2009, Int. J. Found. Comput. Sci..

[108]  Guy Melançon,et al.  Lyndon Factorization of Infinite Words , 1996, STACS.

[109]  Thierry Lecroq,et al.  The Exact String Matching Problem: a Comprehensive Experimental Evaluation , 2010, ArXiv.

[110]  Ge Nong,et al.  Linear Suffix Array Construction by Almost Pure Induced-Sorting , 2009, 2009 Data Compression Conference.

[111]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[112]  Khaled Shaalan,et al.  Towards automatic spell checking for Arabic , 2003 .

[113]  Meng He,et al.  Indexing Compressed Text , 2003 .

[114]  Kazem Taghva,et al.  Arabic stemming without a root dictionary , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[115]  Costas S. Iliopoulos Optimal Cost Parallel Algorithms for Lexicographical Ordering , 1986 .

[116]  R. Mooney,et al.  Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases , 2002 .

[117]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[118]  Shu Wang,et al.  Indeterminate strings, prefix arrays & undirected graphs , 2014, Theor. Comput. Sci..

[119]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[120]  Gaston H. Gonnet,et al.  Fast text searching for regular expressions or automaton searching on tries , 1996, JACM.

[121]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[122]  Eric Rivals,et al.  STAR: an algorithm to Search for Tandem Approximate Repeats , 2004, Bioinform..

[123]  Mohammad Sohel Rahman,et al.  Sub-quadratic time and linear space data structures for permutation matching in binary strings , 2012, J. Discrete Algorithms.

[124]  Christopher J. Fox,et al.  Strength and similarity of affix removal stemming algorithms , 2003, SIGF.

[125]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[126]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[127]  Hideo Bannai,et al.  Computing Palindromic Factorizations and Palindromic Covers On-line , 2014, CPM.

[128]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[129]  Kuan-Yu Chen,et al.  Identifying Approximate Palindromes in Run-Length Encoded Strings , 2010, ISAAC.

[130]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[131]  Kenneth Ward Church,et al.  Probability scoring for spelling correction , 1991 .

[132]  Esko Ukkonen,et al.  A Rotation Invariant Filter for Two-Dimensional String Matching , 1998, CPM.

[133]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[134]  Charles Q. Choi DNA palindromes found in cancer , 2005, Genome Biology.

[135]  Mohammad Sohel Rahman,et al.  Indeterminate string inference algorithms , 2012, J. Discrete Algorithms.

[136]  Francine Blanchet-Sadri,et al.  New Bounds and Extended Relations Between Prefix Arrays, Border Arrays, Undirected Graphs, and Indeterminate Strings , 2016, Theory of Computing Systems.

[137]  W. F. Smyth,et al.  Verifying a border array in linear time , 1999 .

[138]  Chris Taylor,et al.  Error Correction for Arabic Dictionary Lookup , 2010, LREC.

[139]  Maxime Crochemore,et al.  On the implementation of compact DAWG's , 2002, CIAA'02.

[140]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[141]  Gonzalo Navarro,et al.  Optimal Exact and Fast Approximate Two Dimensional Pattern Matching Allowing Rotations , 2002, CPM.

[142]  Filippo Mignosi,et al.  Simple real-time constant-space string matching , 2011, Theor. Comput. Sci..

[143]  Andrew W. Appel,et al.  The world's fastest Scrabble program , 1988, CACM.

[144]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[145]  Laurent Mouchard,et al.  A four-stage algorithm for updating a Burrows-Wheeler transform , 2009, Theor. Comput. Sci..

[146]  Josef van Genabith,et al.  Arabic Word Generation and Modelling for Spell Checking , 2012, LREC.

[147]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[148]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[149]  William F. Smyth,et al.  An Optimal Algorithm to Compute all the Covers of a String , 1994, Inf. Process. Lett..

[150]  Jessica Lin,et al.  Towards an error-free Arabic stemming , 2008, iNEWS '08.

[151]  A. B. Cook Some unsolved problems. , 1952, Hospital management.

[152]  Glenn K. Manacher,et al.  A New Linear-Time ``On-Line'' Algorithm for Finding the Smallest Initial Palindrome of a String , 1975, JACM.

[153]  Ayumi Shinohara,et al.  Efficient algorithms to compute compressed longest common substrings and compressed palindromes , 2009, Theor. Comput. Sci..

[154]  Douglas W. Oard,et al.  Adapting Morphology for Arabic Information Retrieval , 2007 .

[155]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[156]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[157]  Bruce W. Watson,et al.  Incremental construction of minimal acyclic finite state automata , 2000, CL.

[158]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[159]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[160]  Yongqiang Zhang,et al.  SMOTIF: efficient structured pattern and profile motif search , 2006, Algorithms for Molecular Biology.

[161]  William F. Smyth,et al.  Prefix Table Construction and Conversion , 2013, IWOCA.

[162]  Frantisek Franek,et al.  Reconstructing a Suffix Array , 2006, Int. J. Found. Comput. Sci..

[163]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..

[164]  Gonzalo Navarro,et al.  Fast and simple character classes and bounded gaps pattern matching, with application to protein searching , 2001, RECOMB.

[165]  Ross Lippert,et al.  Space-Efficient Whole Genome Comparisons with BurrowsWheeler Transforms , 2005, J. Comput. Biol..

[166]  Szymon Grabowski,et al.  Engineering Relative Compression of Genomes , 2011, ArXiv.

[167]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[168]  Harold S. Stone,et al.  Introduction to Computer Organization and Data Structures , 1971 .

[169]  Costas S. Iliopoulos,et al.  Parallel RAM Algorithms for Factorizing Words , 1994, Theor. Comput. Sci..

[170]  Philip Bille,et al.  String matching with variable length gaps , 2012, Theor. Comput. Sci..

[171]  William F. Smyth,et al.  A Correction to "An Optimal Algorithm to Compute all the Covers of a String" , 1995, Inf. Process. Lett..

[172]  Ricardo A. Baeza-Yates,et al.  An Algorithm for String Matching with a Sequence of don't Cares , 1991, Inf. Process. Lett..

[173]  Antonio Zamora,et al.  Collection and characterization of spelling errors in scientific and scholarly text , 1983, J. Am. Soc. Inf. Sci..

[174]  William Noble Grundy,et al.  Meta-MEME: motif-based hidden Markov models of protein families , 1997, Comput. Appl. Biosci..

[175]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[176]  Philip Bille,et al.  Regular expression matching with multi-strings and intervals , 2010, SODA '10.

[177]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[178]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[179]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume IV: Fascicle 2: Generating All Tuples and Permutations , 2005 .

[180]  William F. Smyth,et al.  A linear partitioning algorithm for Hybrid Lyndons using VV-order , 2013, Theor. Comput. Sci..

[181]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[182]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[183]  Ron Y. Pinter,et al.  Efficient String Matching with Don’t-Care Patterns , 1985 .

[184]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[185]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[186]  Arnaud Lefebvre,et al.  Border Array on Bounded Alphabet , 2002, Stringology.

[187]  Kristina Toutanova,et al.  Pronunciation Modeling for Improved Spelling Correction , 2002, ACL.

[188]  Shihabur Rahman Chowdhury,et al.  Computing a Longest Common Palindromic Subsequence , 2014, Fundam. Informaticae.

[189]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[190]  Riadh Bouslimi,et al.  Fault detection system for Arabic language , 2012, ArXiv.

[191]  Donald L. Kreher,et al.  Combinatorial algorithms: generation, enumeration, and search , 1998, SIGA.

[192]  W. F. Smyth,et al.  Optimal Algorithms for Computing the canonical form of a circular string , 1992, Theor. Comput. Sci..

[193]  Marc Chemillier Periodic musical sequences and Lyndon words , 2004, Soft Comput..

[194]  Elena M. Zamora,et al.  The use of trigram analysis for spelling error detection , 1981, Inf. Process. Manag..

[195]  Rickard Domeij,et al.  Implementation Aspects and Applications of a Spelling Correction Algorithm , 1998 .

[196]  Nematollaah Shiri,et al.  Fast Structured Motif Search in DNA Sequences , 2008, BIRD.

[197]  Carla Savage,et al.  A Survey of Combinatorial Gray Codes , 1997, SIAM Rev..

[198]  David E. Daykin,et al.  Ordered Ranked Posets, Representations of Integers and Inequalities from Extremal Poset Problems , 1985 .

[199]  Gonzalo Navarro,et al.  Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching , 2003, J. Comput. Biol..

[200]  Costas S. Iliopoulos,et al.  String Regularities with Don't Cares , 2003, Nord. J. Comput..