Markovian Models for Sequential Data

Hidden Markov Models HMMs are statistical models of sequential data that have been used successfully in many machine learning applications especially for speech recognition Further more in the last few years many new and promising probabilistic models related to HMMs have been proposed We rst summarize the basics of HMMs and then review several recent related learning algorithms and extensions of HMMs including in particular hybrids of HMMs with arti cial neural networks Input Output HMMs which are conditional HMMs using neu ral networks to compute probabilities weighted transducers variable length Markov models and Markov switching state space models Finally we discuss some of the challenges of future research in this very active area

[1]  H. Rauch Solutions to the linear smoothing problem , 1963 .

[2]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[3]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[4]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[5]  S. Goldfeld,et al.  A Markov model for switching regressions , 1973 .

[6]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[7]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[10]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[11]  N. Kiefer A Note on Switching Regressions and Logistic Discrimination , 1980 .

[12]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[13]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[14]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[15]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[16]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[17]  D. Rumelhart Learning Internal Representations by Error Propagation, Parallel Distributed Processing , 1986 .

[18]  Kin Hong Wong,et al.  Script recognition using hidden Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[20]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[21]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[22]  A. Kundu,et al.  Recognition of handwritten script: a hidden Markov model based approach , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[25]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[26]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[27]  James D. Hamilton A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle , 1989 .

[28]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[29]  James D. Hamilton Analysis of time series subject to changes in regime , 1990 .

[30]  Françoise Fogelman-Soulié,et al.  Speaker-independent isolated digit recognition: Multilayer perceptrons vs. Dynamic time warping , 1990, Neural Networks.

[31]  James K. Baker,et al.  Stochastic modeling for automatic speech understanding , 1990 .

[32]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[33]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[34]  Frank Fallside,et al.  Phoneme Recognition from the TIMIT database using Recurrent Error Propa-gation Networks , 1990 .

[35]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[36]  Xuedong Huang,et al.  On semi-continuous hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[37]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[38]  Roberto Pieraccini,et al.  Time-Warping Network: A Hybrid Framework for Speech Recognition , 1991, NIPS.

[39]  R. Kompe,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[40]  Daniel E. Sichel,et al.  Business cycle duration dependence: a parametric approach , 1991 .

[41]  Yann LeCun,et al.  Multi-Digit Recognition Using a Space Displacement Neural Network , 1991, NIPS.

[42]  René Garcia,et al.  Can a well-fitted equilibrium asset pricing model produce mean reversion? , 1991 .

[43]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[44]  Frank Fallside,et al.  A recurrent error propagation network speech recognition system , 1991 .

[45]  Patrick Gallinari,et al.  Learning vector quantization, multi layer perceptron and dynamic programming: comparison and cooperation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[46]  Yoshua Bengio,et al.  Artificial neural networks and their application to sequence recognition , 1991 .

[47]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[48]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[49]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[50]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[51]  Isabelle Guyon,et al.  Recognition-Based Segmentation of On-Line Hand-Printed Words , 1992, NIPS.

[52]  Hermann Ney,et al.  Data driven search organization for continuous speech recognition , 1992, IEEE Trans. Signal Process..

[53]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[54]  Hervé Bourlard,et al.  Connectionist speech recognition , 1993 .

[55]  Dana Ron,et al.  The Power of Amnesia , 1993, NIPS.

[56]  Mei-Yuh Hwang,et al.  An improved search algorithm using incremental knowledge for continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57]  Padhraic J. Smyth,et al.  Hidden Markov models for fault detection in dynamic systems , 1993 .

[58]  Mitch Weintraub,et al.  Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[59]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[60]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[61]  Steven J. Nowlan,et al.  Mixtures of Controllers for Jump Linear and Non-Linear Plants , 1993, NIPS.

[62]  George Zavaliagkos,et al.  Comparative Experiments on Large Vocabulary Speech Recognition , 1993, HLT.

[63]  F. Diebold,et al.  Regime Switching with Time-Varying Transition Probabilities , 2020, Business Cycles.

[64]  Janet M. Baker,et al.  Large vocabulary continuous speech recognition of Wall Street Journal data , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[65]  Daniel E. Sichel,et al.  Further Evidence on Business Cycle Duration Dependence , 2020, Business Cycles.

[66]  George Zavaliagkos,et al.  A Hybrid Continuous Speech Recognition System Using Segmental Neural Nets with Hidden Markov Models , 1993, Int. J. Pattern Recognit. Artif. Intell..

[67]  Isabelle Guyon,et al.  On-line cursive script recognition using time-delay neural networks and hidden Markov models , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[68]  Hermann Ney,et al.  Large vocabulary continuous speech recognition of Wall Street Journal data , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[69]  James D. Hamilton,et al.  Autoregressive conditional heteroskedasticity and changes in regime , 1994 .

[70]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[71]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[72]  Fernando Pereira,et al.  Weighted Rational Transductions and their Application to Human Language Processing , 1994, HLT.

[73]  René Garcia,et al.  Série Scientifique Scientific Series an Analysis of the Real Interest Rate under Regime Shifts , 2022 .

[74]  M. Solá,et al.  Testing the term structure of interest rates using a stationary vector autoregression with regime switching , 1994 .

[75]  古井 貞煕,et al.  1994 ARPA Human Language Technology Workshop参加報告 , 1994 .

[76]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[77]  Jin-Young Ha,et al.  Unconstrained handwritten word recognition with interconnected hidden markov models = 상호 연결된 은닉 마르코프 모델을 이용한 무제약 필기 단어 인식 , 1994 .

[78]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Michael Sipser,et al.  Inference and minimization of hidden Markov chains , 1994, COLT '94.

[80]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[81]  James D. Hamilton State-space models , 1994 .

[82]  Chang‐Jin Kim,et al.  Dynamic linear models with Markov-switching , 1994 .

[83]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[84]  Pierre Baldi,et al.  Hidden Markov Models of the G-Protein-Coupled Receptor Family , 1994, J. Comput. Biol..

[85]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[86]  Yoshua Bengio,et al.  LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition , 1995, Neural Computation.

[87]  Patrick Haffner,et al.  Discriminant learning with minimum memory loss for improved non-vocabulary rejection , 1995, EUROSPEECH.

[88]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[89]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[90]  Yoshua Bengio,et al.  Diffusion of Context and Credit Information in Markovian Models , 1995, J. Artif. Intell. Res..

[91]  Michael I. Jordan,et al.  Learning Fine Motion by Markov Mixtures of Experts , 1995, NIPS.

[92]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[93]  Dana Ron,et al.  On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[94]  René Garcia,et al.  Are the Effects of Monetary Policy Asymmetric , 1995 .

[95]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[96]  Isabelle Guyon,et al.  Design of a linguistic postprocessor using variable memory length Markov models , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[97]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[98]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[99]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[100]  Samy Bengio,et al.  An EM Algorithm for Asynchronous Input/Output Hidden Markov Models , 1996 .

[101]  Geoffrey E. Hinton,et al.  Switching State-Space Models , 1996 .

[102]  Yoram Singer,et al.  Adaptive Mixtures of Probabilistic Transducers , 1995, Neural Computation.

[103]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[104]  Padhraic Smyth,et al.  Belief networks, hidden Markov models, and Markov random fields: A unifying view , 1997, Pattern Recognit. Lett..

[105]  C Sander,et al.  Predicting protein structure using hidden Markov models , 1997, Proteins.

[106]  Alessandro Sperduti,et al.  On the Efficient Classification of Data Structures by Neural Networks , 1997, IJCAI.

[107]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[108]  Isabelle Guyon,et al.  OVERVIEW AND SYNTHESIS OF ON-LINE CURSIVE HANDWRITING RECOGNITION TECHNIQUES , 1997 .

[109]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[110]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[111]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[112]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[113]  G Neumann,et al.  Survey of the state of the art in human language technology , 1998 .

[114]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[115]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[116]  Yoshua Bengio,et al.  Experiments on the Application of IOHMMs to Model Financial Returns Series * , 2002 .

[117]  E. Ghysels,et al.  TIME-SERIES MODEL WITH PERIODIC STOCHASTIC REGIME SWITCHING , 2001, Macroeconomic Dynamics.

[118]  Michael O. Kolawole,et al.  Estimation and tracking , 2002 .

[119]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[120]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[121]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[122]  Isabelle Guyon,et al.  On-line cursive script recognition using time-delay neural networks and hidden Markov models , 2005, Machine Vision and Applications.