Structural methods in automatic speech recognition

The past decade has witnessed substantial progress toward the goal of constructing a machine capable of understanding colloquial discourse. Central to this progress has been the development and application of mathematical methods that permit modeling the speech signal as a complex code with several coexisting levels of structure. The most successful of these are "template matching," stochastic modeling, and probabilistic parsing. The manifestation of common themes such as dynamic programming and finite-state descriptions accentuates a superficial likeness amongst the methods which is often mistaken for the deeper similarity arising from their shared Bayesian foundation. In this paper, we outline the mathematical bases of these methods, invariant metrics, hidden Markov chains, and formal grammars, respectively. We then recount and briefly interpret the results of experiments in speech recognition to which the various methods were applied. Since these mathematical principles seem to bear little resemblance to traditional linguistic characterizations of speech, the success of the experiments is occasionally attributed, even by their authors, merely to excellent engineering. We conclude by speculating that, quite to the contrary, these methods actually constitute a powerful theory of speech that can be reconciled with and elucidate conventional linguistic theories while being used to build truly competent mechanical speech recognizers.

[1]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[2]  G. A. Miller,et al.  The intelligibility of speech as a function of the context of the test materials. , 1951, Journal of experimental psychology.

[3]  J. Cooper,et al.  Les Fonctions définies-positives et les Fonctions complètement monotones , 1951, The Mathematical Gazette.

[4]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[5]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[6]  K. Davis,et al.  Automatic Recognition of Spoken Digits , 1952 .

[7]  Homer Dudley,et al.  Automatic Recognition of Phonetic Patterns in Speech , 1958 .

[8]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[9]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[10]  P. Denes,et al.  Spoken Digit Recognition Using Time‐Frequency Pattern Matching , 1960 .

[11]  Kenneth N. Stevens,et al.  Speech recognition: A model and a program for research , 1962, IRE Trans. Inf. Theory.

[12]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[13]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[14]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[15]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[16]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[17]  D. Reddy Computer recognition of connected speech. , 1967, The Journal of the Acoustical Society of America.

[18]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[19]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[20]  R. Alter,et al.  Utilization of contextual constraints in automatic speech recognition , 1968 .

[21]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[22]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[23]  Jeffrey D. Ullman,et al.  Formal languages and their relation to automata , 1969, Addison-Wesley series in computer science and information processing.

[24]  William A. Woods,et al.  Computational Linguistics Transition Network Grammars for Natural Language Analysis , 2022 .

[25]  Edward A. Patrick,et al.  A Generalized k-Nearest Neighbor Rule , 1970, Inf. Control..

[26]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[27]  N. G. Zagoruyko,et al.  Automatic recognition of 200 words , 1970 .

[28]  Hiroaki Sakoe,et al.  A Dynamic Programming Approach to Continuous Speech Recognition , 1971 .

[29]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[30]  P. Stebe INVARIANT FUNCTIONS OF AN ITERATIVE PROCESS FOR MAXIMIZATION OF A POLYNOMIAL , 1972 .

[31]  D. Passman The Jacobian of a growth transformation , 1973 .

[32]  L. Rabiner,et al.  An algorithm for minimizing roundoff noise in cascade realizations of finite impulse response digital filters , 1973 .

[33]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[34]  R. De Mori,et al.  A descriptive technique for automatic speech recognition , 1973 .

[35]  D. Klatt,et al.  On the automatic recognition of continuous speech:Implications from a spectrogram-reading experiment , 1973 .

[36]  Harvey F. Silverman,et al.  A parametrically controlled spectral analysis system for speech , 1974 .

[37]  King-Sun Fu,et al.  Stochastic Syntactic Decoding for Pattern Classification , 1975, IEEE Trans. Computers.

[38]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[39]  Lalit R. Bahl,et al.  Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition , 1975, IEEE Trans. Inf. Theory.

[40]  W. Woods,et al.  Motivation and overview of SPEECHLIS: An experimental prototype for speech understanding research , 1975 .

[41]  V. Zue,et al.  The role of phonological rules in speech understanding research , 1975 .

[42]  Donald E. Walker,et al.  The SRI speech understanding system , 1975 .

[43]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[44]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[45]  Stephen E. Levinson,et al.  The Vocal Speech Understanding System , 1975, IJCAI.

[46]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey - Part I , 1975, IEEE Trans. Syst. Man Cybern..

[47]  Wayne A. Lea,et al.  A prosodically guided speech understanding strategy , 1975 .

[48]  Victor Lesser,et al.  Organization of the Hearsay II speech understanding system , 1975 .

[49]  Lee D. Erman,et al.  The Hearsay-I Speech Understanding System: An Example of the Recognition Process , 1973, IEEE Transactions on Computers.

[50]  Beatrice T. Oshika Phonological rule testing of conversational speech , 1976, ICASSP.

[51]  A. E. Rosenberg,et al.  Evaluation of an automatic word recognition system over dialed‐up telephone lines , 1976 .

[52]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[53]  Charles C. Tappert A Markov Model Acoustic Phonetic Component for Automatic Speech Recognition , 1977, Int. J. Man Mach. Stud..

[54]  Franklin S. Cooper,et al.  Speech Understanding Systems , 1976, Artificial Intelligence.

[55]  J. L. Hall Two-tone suppression in a nonlinear model of the basilar membrane. , 1977, The Journal of the Acoustical Society of America.

[56]  J. Allen Cochlear micromechanics--a mechanism for transforming mechanical to neural tuning within the cochlea. , 1977, The Journal of the Acoustical Society of America.

[57]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[58]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[59]  Lalit R. Bahl,et al.  Automatic recognition of continuously spoken sentences from a finite state grammer , 1978, ICASSP.

[60]  S. E. Levinson,et al.  The effects of syntactic analysis on word recognition accuracy , 1978, The Bell System Technical Journal.

[61]  Lalit R. Bahl,et al.  Recognition of continuously read natural corpus , 1978, ICASSP.

[62]  Eiichi Tanaka,et al.  Error-Correcting Parsers for Formal Languages , 1978, IEEE Transactions on Computers.

[63]  Stephen E. Levinson,et al.  Computing relative redundancy to measure grammatical constraint in speech recognition tasks , 1978, ICASSP.

[64]  H. Sakoe,et al.  Two-level DP-matching--A dynamic programming-based pattern matching algorithm for connected word recognition , 1979 .

[65]  J. Baker Trainable grammars for speech recognition , 1979 .

[66]  Sheila A. Greibach,et al.  Automata and formal languages ∗ , 2022 .

[67]  Paul Mermelstein,et al.  Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[68]  G. Mercier,et al.  The KEAL Speech Understanding System , 1980 .

[69]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[70]  Lalit R. Bahl,et al.  Further results on the recognition of a continuously read natural corpus , 1980, ICASSP.

[71]  K. Stevens Acoustic correlates of some phonetic categories. , 1979, The Journal of the Acoustical Society of America.

[72]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[73]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[74]  S. E. Levinson,et al.  A minimum-distance search technique and its application to automatic directory assistance , 1980, The Bell System Technical Journal.

[75]  Jean-Marie Pierrel,et al.  Syntactic-Semantic interpretation of sentences in the MYRTILLE II speech understanding system , 1980, ICASSP.

[76]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[77]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[78]  Lawrence R. Rabiner,et al.  Connected digit recognition using a level-building DTW algorithm , 1981 .

[79]  C. Myers,et al.  A level building dynamic time warping algorithm for connected word recognition , 1981 .

[80]  Günther Ruske,et al.  The efficiency of demisyllable segmentation in the recognition of spoken words , 1981, ICASSP.

[81]  L. Rabiner,et al.  Isolated and Connected Word Recognition - Theory and Selected Applications , 1981, IEEE Transactions on Communications.

[82]  G. Ruske,et al.  The efficiency of demisyllable segmentation in the recognition of spoken words , 1981 .

[83]  Victor Zue,et al.  Properties of large lexicons: Implications for advanced isolated word recognition systems , 1982, ICASSP.

[84]  J. G. Wilpon,et al.  An improved training procedure for connected-digit recognition , 1982, The Bell System Technical Journal.

[85]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[86]  William A. Woods Optimal Search Strategies for Speech Understanding Control , 1982, Artif. Intell..

[87]  Roberto Billi,et al.  Vector quantization and Markov source models applied to speech recognition , 1982, ICASSP.

[88]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[89]  Frederick Jelinek,et al.  25 Continuous speech recognition: Statistical methods , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[90]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[91]  Stephen E. Levinson,et al.  Speaker independent connected word recognition using a syntax-directed dynamic programming procedure , 1982 .

[92]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[93]  Carlo Scagliola Continuous speech recognition without segmentation: Two ways of using diphones as basic speech units , 1983, Speech Commun..

[94]  Stephen E. Levinson,et al.  On the use of hidden Markov models for speaker‐independent recognition of isolated words from a medium size vocabulary , 1983 .

[95]  Aaron E. Rosenberg,et al.  Demisyllable-based isolated word recognition system , 1983 .

[96]  Michael Picheny,et al.  Recognition of isolated-word sentences from a 5000-word vocabulary office correspondence task , 1983, ICASSP.

[97]  Victor Zue The use of phonetic rules in automatic speech recognition , 1983, Speech Commun..

[98]  Roger K. Moore,et al.  Some techniques for incorporating local timescale variability information into a dynamic time-warping algorithm for automatic speech recognition , 1983, ICASSP.

[99]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[100]  H. Fitch Reclaiming temporal information after dynamic time warping , 1983 .

[101]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  L. R. Rabiner,et al.  On the use of hidden Markov models for speaker-independent recognition of isolated words from a medium-size vocabulary , 1984, AT&T Bell Laboratories Technical Journal.

[103]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[104]  A. Nadas,et al.  Estimation of probabilities in the language model of the IBM speech recognition system , 1984 .

[105]  L. R. Rabiner,et al.  A vector quantizer combining energy and LPC parameters and its application to isolated word recognition , 1984, AT&T Bell Laboratories Technical Journal.

[106]  Hermann Ney,et al.  Connected digit recognition using vector quantization , 1984, ICASSP.

[107]  Tetsunosuke Fujisaki A stochastic approach to sentence parsing , 1984 .

[108]  Pietro Laface,et al.  Parallel Algorithms for Syllable Recognition in Continuous Speech , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[109]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.