Basic directions in automatic speech recognition

This paper represents a view of basic problems in automatic speech recognition drawn from the vantage point of applied linguistics. The basic areas of speech production, articulatory phonetics, acoustic analysis, acoustic phonetics, and phonetic sequences of natural speech are crucial to flexible automatic speech recognition, especially for the long-range goal of recognizing continuous-flow large-vocabulary natural speech of different speakers. Articulatory phonetics provides a link between the physical events of speech and the elements of the phonological code. The processes of speech production are basic to articulatory phonetics while the acoustic theory of speech production provides a basis for the phonetic interpretation of the acoustic. speech waveform. Formant frequencies are particularly important acoustic parameters and recent improvements in formant tracking are encouraging for speech recognition. Even if complete phonetic recognition is realized, however, there remains the conversion of phonetic strings to lexical strings. This problem requires extensive phonetic analysis of actual spoken language.

[1]  M. Schroeder Determination of the geometry of the human vocal tract by acoustic measurements. , 1967, The Journal of the Acoustical Society of America.

[2]  David R. Hill,et al.  An ESOTerIC approach to some problems in automatic speech recognition , 1969 .

[3]  Wayne A. Lea,et al.  Towards Versatile Speech Communication with Computers , 1970 .

[4]  K. Stevens,et al.  Acoustical description of syllabic nuclei: an interpretation in terms of a dynamic model of articulation. , 1966, The Journal of the Acoustical Society of America.

[5]  S. Öhman Coarticulation in VCV Utterances: Spectrographic Measurements , 1966 .

[6]  B. S. Atal,et al.  Determination of the Vocal‐Tract Shape Directly from the Speech Wave , 1970 .

[7]  D Dew,et al.  Acoustic properties of certain VCC utterances. , 1969, The Journal of the Acoustical Society of America.

[8]  D. Broad,et al.  Formant-frequency trajectories in selected CVC-syllable nuclei. , 1970, The Journal of the Acoustical Society of America.

[9]  E. W. Scripture Researches in experimental phonetics , 1906 .

[10]  Nilo A Lindgren,et al.  Machine recognition of human language Part I - Automatic speech recognition , 1965, IEEE Spectrum.

[11]  G. E. Peterson,et al.  A physiological theory of phonetics. , 1966, Journal of speech and hearing research.

[12]  W. Koenig,et al.  The Sound Spectrograph * , 2011 .

[13]  John D. Markel The Prony Method and Its Application to Speech Analysis , 1971 .

[14]  A. Liberman Some Results of Research on Speech Perception , 1957 .

[15]  G. E. Peterson,et al.  The elements of an acoustic phonetic theory. , 1966, Journal of speech and hearing research.

[16]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[17]  W. Klein,et al.  Vowel spectra, vowel spaces, and vowel identification. , 1970, The Journal of the Acoustical Society of America.

[18]  David R. Hill,et al.  Man-Machine Interaction Using Speech , 1971, Adv. Comput..