String taxonomy using learning automata

A typical syntactic pattern recognition (PR) problem involves comparing a noisy string with every element of a dictionary, X. The problem of classification can be greatly simplified if the dictionary is partitioned into a set of subdictionaries. In this case, the classification can be hierarchical-the noisy string is first compared to a representative element of each subdictionary and the closest match within the subdictionary is subsequently located. Indeed, the entire problem of subdividing a set of string into subsets where each subset contains "similar" strings has been referred to as the "String Taxonomy Problem". To our knowledge there is no reported solution to this problem. In this paper we present a learning-automaton based solution to string taxonomy. The solution utilizes the Object Migrating Automaton the power of which in clustering objects and images has been reported. The power of the scheme for string taxonomy has been demonstrated using random string and garbled versions of string representations of fragments of macromolecules.

[1]  B. John Oommen,et al.  Deterministic Learning Automata Solutions to the Equipartitioning Problem , 1988, IEEE Trans. Computers.

[2]  David L. Neuhoff,et al.  The Viterbi algorithm as an aid in text recognition (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[3]  B. John Oommen,et al.  Spelling correction using probabilistic methods , 1984, Pattern Recognit. Lett..

[4]  Chak-Kuen Wong,et al.  Bounds for the String Editing Problem , 1976, JACM.

[5]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[6]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[7]  B. John Oommen,et al.  Fast Learning Automaton-Based Image Examination and Retrieval , 1993, Comput. J..

[8]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Tamotsu Kasai,et al.  A Method for the Correction of Garbled Words Based on the Levenshtein Metric , 1976, IEEE Transactions on Computers.

[10]  B. John Oommen,et al.  The Noisy Substring Matching Problem , 1983, IEEE Transactions on Software Engineering.

[11]  Lalit R. Bahl,et al.  Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition , 1975, IEEE Trans. Inf. Theory.

[12]  James L. Peterson,et al.  Computer programs for detecting and correcting spelling errors , 1980, CACM.

[13]  Anil K. Jain,et al.  Clustering techniques: The user's dilemma , 1976, Pattern Recognit..

[14]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[15]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[16]  Mireille Régnier A Language Approach to String Searching Evaluation , 1992, CPM.

[17]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[18]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[19]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[20]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[21]  A WagnerRobert,et al.  An Extension of the String-to-String Correction Problem , 1975 .

[22]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[23]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[24]  B. John Oommen Recognition of Noisy Subsequences Using Constrained Edit Distances , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  R. L. Kashyap,et al.  Similarity measures for sets of strings , 1983 .

[26]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[27]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[28]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[29]  B. John Oommen,et al.  An effective algorithm for string correction using generalized edit distances--I. Description of the algorithm and its optimality , 1981, Inf. Sci..

[30]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[31]  Philippe Jacquet,et al.  Analysis of digital tries with Markovian dependency , 1991, IEEE Trans. Inf. Theory.

[32]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[33]  Keith Price,et al.  Review of "Principles of Artificial Intelligence by Nils J. Nilsson", Tioga Publishing Company, Palo Alto, CA, ISBN 0-935382-01-1. , 1980, SGAR.

[34]  Godfried T. Toussaint,et al.  Experiments in Text Recognition with the Modified Viterbi Algorithm , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  M. L. Tsetlin,et al.  Automaton theory and modeling of biological systems , 1973 .

[36]  B. John Oommen,et al.  Stochastic Automata Solutions to the Object Partitioning Problem , 1991, Comput. J..

[37]  Wojciech Szpankowski,et al.  A Note on the Height of Suffix Trees , 1992, SIAM J. Comput..

[38]  R. Kashyap,et al.  A common basis for similarity measures involving two strings , 1983 .

[39]  Wojciech Szpankowski,et al.  Probabilistic Analysis of Generalized Suffix Trees (Extended Abstract) , 1992, CPM.

[40]  Sargur N. Srihari,et al.  Computer Text Recognition and Error Correction , 1985 .

[41]  Godfrey Dewey,et al.  Relativ frequency of English speech sounds , 1923 .

[42]  L. Devroye Non-Uniform Random Variate Generation , 1986 .