An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules

A wide variety of different applications lead to problems in which sequences of different lengths must be compared, to see how different they are, and to see which elements in one sequence correspond to which elements in the other sequence. Successful methods for handling these problems been repeatedly reinvented which incorporate two basic ideas a systematic concept of distance between sequences, and elegant recursive algorithms for doing the necessary computations. The major application areas are speech processing and macromolecular biology. Computer science is another significant application area, and applications have also been made to bird song, handwriting analysis, gas chromatography, geological strata, and text collation. This paper surveys the applications, methods and theory of sequence comparison.

[1]  Frederick F. Sellers,et al.  Bit loss and gain correction code , 1962, IRE Trans. Inf. Theory.

[2]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[3]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[4]  Jeffrey D. Ullman,et al.  Near-optimal, single-synchronization-error-correcting code , 1966, IEEE Trans. Inf. Theory.

[5]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[6]  Jeffrey D. Ullman,et al.  On the capabilities of codes to correct synchronization errors , 1967, IEEE Trans. Inf. Theory.

[7]  Cyril N. Alberga,et al.  String similarity and misspellings , 1967, CACM.

[8]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[9]  Lorenzo Calabi,et al.  A family of codes for the correction of substitution and synchronization errors , 1969, IEEE Trans. Inf. Theory.

[10]  G. Kubica,et al.  Predictive value of pyrolysis-gas-liquid chromatography in the differentiation of mycobacteria. , 1969, The American review of respiratory disease.

[11]  Lorenzo Calabi,et al.  Some General Results of Coding Theory with Applications to the Study of Codes for the Correction of Synchronization Errors , 1969, Inf. Control..

[12]  Howard Lee Morgan,et al.  Spelling correction in systems programs , 1970, Commun. ACM.

[13]  N. G. Zagoruyko,et al.  Automatic recognition of 200 words , 1970 .

[14]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[15]  Hiroaki Sakoe,et al.  A Dynamic Programming Approach to Continuous Speech Recognition , 1971 .

[16]  B. Julesz Foundations of Cyclopean Perception , 1971 .

[17]  Umberto Bertelè,et al.  Nonserial Dynamic Programming , 1972 .

[18]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[19]  T. Reichert,et al.  An application of information theory to genetic mutations and the matching of polypeptide sequences. , 1973, Journal of theoretical biology.

[20]  D. Sankoff,et al.  Evolution of 5S RNA and the non-randomness of base replacement. , 1973, Nature: New biology.

[21]  J.-P. Haton A practical application of a real-time isolated-word recognition system using syntactic constraints , 1974 .

[22]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[23]  W. A. Beyer,et al.  A molecular sequence metric and evolutionary trees , 1974 .

[24]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[25]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[26]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[27]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[28]  G. White,et al.  Speech recognition experiments with linear predication, bandpass filtering, and dynamic programming , 1976 .

[29]  Robert L. Cannon OPCOL: An Optimal Text Collation Algorithm , 1976 .

[30]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[31]  Chak-Kuen Wong,et al.  Bounds for the String Editing Problem , 1976, JACM.

[32]  G. A. Garreau,et al.  Elementary Dynamic Programming , 1976 .

[33]  Eiichi Tanaka,et al.  Synchronization and substitution error-correcting codes for the Levenshtein metric , 1976, IEEE Trans. Inf. Theory.

[34]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[35]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[36]  Bela Julesz,et al.  Global Stereopsis: Cooperative Phenomena in Stereoscopic Depth Perception , 1978 .

[37]  E Reiner,et al.  Botulism: a pyrolysis-gas-liquid chromatographic study. , 1978, Journal of chromatographic science.

[38]  Daniel S. Hirschberg,et al.  An Information-Theoretic Lower Bound for the Longest Common Subsequence Problem , 1977, Inf. Process. Lett..

[39]  Aaron E. Rosenberg,et al.  Considerations in dynamic time warping algorithms for discrete word recognition , 1978 .

[40]  George M. White Dynamic programming, the viterbi algorithm, and low cost speech recognition , 1978, ICASSP.

[41]  Robert E. Larson,et al.  Principles of Dynamic Programming , 1978 .

[42]  Thomas B. Martin,et al.  Automatic Speech and Speaker Recognition , 1979 .

[43]  T. F. Moran,et al.  Characterization of normal human cells by pyrolysis gas chromatography mass spectrometry. , 1979, Biomedical mass spectrometry.

[44]  Lawrence R. Rabiner,et al.  Application of dynamic time warping to connected digit recognition , 1980 .

[45]  Temple F. Smith,et al.  New Stratigraphic Correlation Techniques , 1980, The Journal of Geology.

[46]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[47]  Stephen C. Johnson Language Development Tools on the Unix System , 1980, Computer.

[48]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[49]  David J. Burr,et al.  Elastic Matching of Line Drawings , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Lawrence R. Rabiner,et al.  Connected digit recognition using a level-building DTW algorithm , 1981 .

[51]  C. Myers,et al.  A level building dynamic time warping algorithm for connected word recognition , 1981 .

[52]  James Gosling A redisplay algorithm , 1981 .

[53]  Ruth Nussinov,et al.  Small changes in free energy assignments for unpaired bases do not affect predicted secondary structures in single stranded RNA , 1982, Nucleic Acids Res..

[54]  W. P. Rindone,et al.  Computer-aided prediction of RNA secondary structures , 1982, Nucleic Acids Res..

[55]  M. I. Kanehisa,et al.  Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries , 1982, Nucleic Acids Res..

[56]  Minoru I. Kanehisa,et al.  Pattern recognition in nucleic acid sequences. II. An efficient method for finding locally stable secondary structures , 1982, Nucleic Acids Res..

[57]  David J. Burr,et al.  Designing a Handwriting Reader , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.