Probabilistic finite-state machines - part I

Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition, and machine translation are some of them. In Part I of this paper, we survey these generative objects and study their definitions and properties. In Part II, we study the relation of probabilistic finite-state automata with other well-known devices that generate strings as hidden Markov models and n-grams and provide theorems, algorithms, and properties that represent a current state of the art of these objects.

[1]  Srinivas Bangalore,et al.  Head-Transducer Models for Speech Translation and Their Automatic Acquisition from Bilingual Data , 2004, Machine Translation.

[2]  Leonard Pitt,et al.  The minimum consistent DFA problem cannot be approximated within any polynomial , 1993, JACM.

[3]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[4]  Kate Knill,et al.  Hidden Markov Models in Speech and Language Processing , 1997 .

[5]  Samuel Eilenberg,et al.  Automata, languages, and machines. A , 1974, Pure and applied mathematics.

[6]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[7]  Encarna Segarra,et al.  INDUCTIVE LEARNING OF FINITE-STATE TRANSDUCERS FOR THE INTERPRETATION OF UNIDIMENSIONAL OBJECTS , 1990 .

[8]  Francisco Casacuberta,et al.  Comparison Between the Inside-Outside Algorithm and the Viterbi Algorithm for Stochastic Context-Free Grammars , 1996, SSPR.

[9]  Enrique Vidal,et al.  Using knowledge to improve N-gram language modelling through the MGGI methodology , 1996, ICGI.

[10]  Ana L. N. Fred,et al.  Computation of Substring Probabilities in Stochastic Grammars , 2000, ICGI.

[11]  Azaria Paz,et al.  Probabilistic automata , 2003 .

[12]  Pierre Dupont,et al.  Using Symbol Clustering to Improve Probabilistic Automaton Inference , 1998, ICGI.

[13]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[14]  Juan Miguel Vilar,et al.  Improve the Learning of Subsequential Transducers by Using Alignments and Dictionaries , 2000, ICGI.

[15]  Rafael Llobet,et al.  Computer-Aided Prostate Cancer Detection in Ultrasonographic Images , 2003, IbPRIA.

[16]  Jorge Calera-Rubio,et al.  Stochastic Inference of Regular Tree Languages , 2004, Machine Learning.

[17]  Gianfranco Bilardi,et al.  Language learning from stochastic input , 1992, COLT '92.

[18]  Takeshi Koshiba,et al.  Inferring pure context-free languages from positive data , 2000, Acta Cybern..

[19]  Francisco Casacuberta,et al.  Some Statistical-Estimation Methods for Stochastic Finite-State Transducers , 2004, Machine Learning.

[20]  Enrique Vidal,et al.  Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Pierre Dupont,et al.  Smoothing Probabilistic Automata: An Error-Correcting Approach , 2000, ICGI.

[22]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[23]  James J. Horning,et al.  A Procedure for Grammatical Inference , 1971, IFIP Congress.

[24]  Rémi Gilleron,et al.  PAC Learning under Helpful Distributions , 1997, RAIRO Theor. Informatics Appl..

[25]  Alon Orlitsky,et al.  Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.

[26]  Mark-Jan Nederhof,et al.  Regular Approximation of Context-Free Grammars through Transformation , 2001 .

[27]  Richard K. Belew,et al.  Stochastic Context-Free Grammar Induction with a Genetic Algorithm Using Local Search , 1996, FOGA.

[28]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[29]  Yaser Al-Onaizan,et al.  Translation with Finite-State Devices , 1998, AMTA.

[30]  Francisco Casacuberta,et al.  The EuTrans Spoken Language Translation System , 2004, Machine Translation.

[31]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[32]  Dana Ron,et al.  On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[33]  Fred J. Maryanski,et al.  Properties of stochastic syntax-directed translation schemata , 1979, International Journal of Computer & Information Sciences.

[34]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[35]  Francisco Casacuberta,et al.  Local Languages, the Succesor Method, and a Step Towards a General Methodology for the Inference of Regular Grammars , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[37]  Mehryar Mohri,et al.  The Design Principles of a Weighted Finite-State Transducer Library , 2000, Theor. Comput. Sci..

[38]  Henning Fernau,et al.  Grammatical Inference: Algorithms and Applications , 2002, Lecture Notes in Computer Science.

[39]  V. Balasubramanian Equivalence and Reduction of Hidden Markov Models , 1993 .

[40]  Hermann Ney,et al.  Integrated Handwriting Recognition And Interpretation Using Finite-State Models , 2004, Int. J. Pattern Recognit. Artif. Intell..

[41]  Azaria Paz,et al.  Introduction to probabilistic automata (Computer science and applied mathematics) , 1971 .

[42]  Pietro Laface,et al.  Speech Recognition and Understanding: Recent Advances, Trends, and Applications , 1997 .

[43]  Alexander Clark,et al.  PAC-learnability of Probabilistic Deterministic Finite State Automata , 2004, J. Mach. Learn. Res..

[44]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[45]  Francisco Casacuberta,et al.  A Statistical-Estimation Method for Stochastic Finite-State Transducers Based on Entropy Measures , 2000, SSPR/SPR.

[46]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[47]  Wen-Guey Tzeng,et al.  A Polynomial-Time Algorithm for the Equivalence of Probabilistic Automata , 1992, SIAM J. Comput..

[48]  Francisco Casacuberta,et al.  Finite State Language Models Smoothed Using n-Grams , 2002, Int. J. Pattern Recognit. Artif. Intell..

[49]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[50]  J. Picone,et al.  Continuous speech recognition using hidden Markov models , 1990, IEEE ASSP Magazine.

[51]  David Llorens Piñana Suavizado de autómatas y traductores finitos estocásticos , 2000 .

[52]  Mark-Jan Nederhof,et al.  Practical Experiments with Regular Approximation of Context-Free Languages , 1999, CL.

[53]  Hermann Ney,et al.  Corpus-Based Statistical Methods in Speech and Language Processing , 1997 .

[54]  Robert McNaughton,et al.  Algebraic decision procedures for local testability , 1974, Mathematical systems theory.

[55]  D. Ron,et al.  Learning Fallible Deterministic Finite Automata , 2004, Machine Learning.

[56]  Rémi Gilleron,et al.  PAC Learning with Simple Examples , 1996, STACS.

[57]  C. S. Wetherell,et al.  Probabilistic Languages: A Review and Some Open Questions , 1980, CSUR.

[58]  Naoki Abe,et al.  Predicting Protein Secondary Structure Using Stochastic Tree Grammars , 1997, Machine Learning.

[59]  Francisco Casacuberta,et al.  Submission to ICGI-2000 Computational complexity of problems on probabilistic grammars and transducers , 2007 .

[60]  N. Merhav,et al.  Hidden Markov modeling using a dominant state sequence with application to speech recognition , 1991 .

[61]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[62]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[63]  Michael G. Thomason,et al.  Syntactic Methods in Pattern Recognition , 1982 .

[64]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[65]  Hermann Ney,et al.  Some approaches to statistical and finite-state speech-to-speech translation , 2004, Comput. Speech Lang..

[66]  Hermann Ney,et al.  Stochastic Grammars and Pattern Recognition , 1992 .

[67]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[68]  Yasubumi Sakakibara,et al.  Learning context-free grammars from structural data in polynomial time , 1988, COLT '88.

[69]  Dana Ron,et al.  Learning probabilistic automata with variable memory length , 1994, COLT '94.

[70]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[71]  Horst Bunke,et al.  Hidden Markov models: applications in computer vision , 2001 .

[72]  Laurent Miclet,et al.  Structural Methods in Pattern Recognition , 1986 .

[73]  José Oncina,et al.  Learning deterministic regular grammars from stochastic samples in polynomial time , 1999, RAIRO Theor. Informatics Appl..

[74]  Francisco Casacuberta,et al.  Inference of finite-state transducers from regular languages , 2005, Pattern Recognit..

[75]  Christian N. S. Pedersen,et al.  Metrics and Similarity Measures for Hidden Markov Models , 1999, ISMB.

[76]  Pierre Dupont,et al.  Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms , 2005, Pattern Recognit..

[77]  Rajesh Parekh,et al.  Learning DFA from Simple Examples , 1997, Machine Learning.

[78]  Colin de la Higuera,et al.  Identification in the Limit with Probability One of Stochastic Deterministic Finite Automata , 2000, ICGI.

[79]  Francisco Casacuberta Maximum mutual information and conditional maximum likelihood estimation of stochastic regular syntax-directed translation schemes , 1996, ICGI.

[80]  Olivier Gascuel,et al.  Hidden Markov Models with Patterns to Learn Boolean Vector Sequences and Application to the Built-In Self-Test for Integrated Circuits , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[81]  Erik F. Tjong Kim Sang,et al.  Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..

[82]  Simon M. Lucas,et al.  A Comparison of Syntactic and Statistical Techniques for Off-Line OCR , 1994, ICGI.

[83]  Francisco Casacuberta Finite-state transducers for speech-input translation , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[84]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[85]  Francisco Casacuberta Growth Transformations for Probability Functions of Stochastic Grammars , 1996, Int. J. Pattern Recognit. Artif. Intell..

[86]  Michael A. Arbib,et al.  An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[87]  Srinivas Bangalore,et al.  Stochastic Finite-State Models for Spoken Language Machine Translation , 2000, Machine Translation.

[88]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[89]  Jason Eisner,et al.  Parameter Estimation for Probabilistic Finite-State Transducers , 2002, ACL.

[90]  Enrique Vidal,et al.  Learning Regular Grammars to Model Musical Style: Comparing Different Coding Schemes , 1998, ICGI.

[91]  Alexander Clark,et al.  Shallow Parsing Using Probabilistic Grammatical Inference , 2002, ICGI.

[92]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[93]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[94]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey - Part I , 1975, IEEE Trans. Syst. Man Cybern..

[95]  Rafael C. Carrasco Accurate Computation of the Relative Entropy Between Stochastic Regular Grammars , 1997, RAIRO Theor. Informatics Appl..

[96]  Joan-Andreu Sánchez,et al.  Consistency of Stochastic Context-Free Grammars From Probabilistic Estimation Based on Growth Transformations , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[97]  A. N. V. Rao,et al.  Approximating grammar probabilities: solution of a conjecture , 1986, JACM.

[98]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[99]  Colin de la Higuera,et al.  Characteristic Sets for Polynomial Grammatical Inference , 1997, Machine Learning.

[100]  Ferran Plà,et al.  Shallow Parsing using Specialized HMMs , 2002, J. Mach. Learn. Res..

[101]  Azriel Rosenfeld,et al.  Some Experiments in Grammatical Inference , 1976 .

[102]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[103]  Francisco Casacuberta Statistical estimation of stochastic context-free grammars , 1995, Pattern Recognit. Lett..

[104]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[105]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[106]  Francisco Casacuberta,et al.  Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[107]  Yechezkel Zalcstein,et al.  Locally Testable Languages , 1972, J. Comput. Syst. Sci..

[108]  Francisco Casacuberta,et al.  Architectures for Speech-to-Speech Translation Using Finite-state Models , 2002, Speech-to-Speech Translation@ACL.

[109]  Jan Paredaens,et al.  A general definition of stochastic automata , 1974, Computing.

[110]  Takeshi Koshiba,et al.  Learning Deterministic even Linear Languages From Positive Examples , 1997, Theor. Comput. Sci..

[111]  David A. McAllester,et al.  On the Convergence Rate of Good-Turing Estimators , 2000, COLT.

[112]  Mariëlle Stoelinga,et al.  An Introduction to Probabilistic Automata , 2002, Bull. EATCS.

[113]  Robert G. Gallager,et al.  Discrete Stochastic Processes , 1995 .

[114]  Srinivas Bangalore,et al.  A Finite-State Approach to Machine Translation , 2001, NAACL.

[115]  Enrique Vidal,et al.  Language Simplification through Error-Correcting and Grammatical Inference Techniques , 2004, Machine Learning.

[116]  Pierre Dupont,et al.  Stochastic Grammatical Inference with Multinomial Tests , 2002, ICGI.

[117]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[118]  Francisco Casacuberta Some Relations Among Stochastic Finite State Networks Used in Automatic Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[119]  Matthew Young-Lai,et al.  Stochastic Grammatical Inference of Text Database Structure , 2000, Machine Learning.

[120]  Francisco Casacuberta Inference of Finite-State Transducers by Using Regular Grammars and Morphisms , 2000, ICGI.

[121]  Christian N. S. Pedersen,et al.  Complexity of Comparing Hidden Markov Models , 2001, ISAAC.

[122]  Vincent D. Blondel,et al.  Undecidable Problems for Probabilistic Automata of Fixed Dimension , 2003, Theory of Computing Systems.

[123]  Enrique Vidal,et al.  Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[124]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[125]  Yuji Takada Grammatical Interface for Even Linear Languages Based on Control Sets , 1988, Inf. Process. Lett..

[126]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[127]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[128]  Michael G. Thomason Stochastic Syntax-Directed Translation Schemata for Correction of Errors in Context-Free Languages , 1975, IEEE Transactions on Computers.

[129]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[130]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[131]  Colin de la Higuera,et al.  Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality , 2000, ICML.

[132]  Richard M. Schwartz,et al.  An Omnifont Open-Vocabulary OCR System for English and Arabic , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[133]  M. G. Thomason Regular Stochastic Syntax-Directed Translations , 1976 .

[134]  Neri Merhav,et al.  Maximum likelihood hidden Markov modeling using a dominant sequence of states , 1991, IEEE Trans. Signal Process..

[135]  Franck Thollard Improving Probabilistic Grammatical Inference Core Algorithms with Post-processing Techniques , 2001, ICML.

[136]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[137]  Erkki Mäkinen Inferring Finite Transducers , 2003, J. Braz. Comput. Soc..

[138]  Colin de la Higuera,et al.  Learning Languages with Help , 2002, ICGI.

[139]  Jean-Claude Junqua,et al.  Robustness in Language and Speech Technology , 2001, Text, Speech and Language Technology.

[140]  Yoshua Bengio,et al.  Experiments on the Application of IOHMMs to Model Financial Returns Series * , 2002 .