Perceptual linear predictive (PLP) analysis of speech.

A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.

[1]  A. Liberman,et al.  An Experimental Study of the Acoustic Determinants of Vowel Color; Observations on One- and Two-Formant Vowels Synthesized from Spectrographic Patterns , 1952 .

[2]  J. Flanagan A Difference Limen for Vowel Formant Frequency , 1955 .

[3]  D. W. Robinson,et al.  A re-determination of the equal-loudness relations for pure tones , 1956 .

[4]  T. Chiba The vowel, its nature and structure , 1958 .

[5]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[6]  O. Fujimura,et al.  On the Second Spectral Peak of Front Vowels: a Perceptual Study of the Role of the Second and Third Formants , 1967, Language and speech.

[7]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[8]  S. S. Stevens Frequency Analysis and Periodicity Detection in Hearing. , 1972 .

[9]  G. Kuhn On the front cavity resonance and its possible role in speech perception. , 1975, The Journal of the Acoustical Society of America.

[10]  G. Fant,et al.  Two-formant Models, Pitch and Vowel Perception , 1975 .

[11]  J. Makhoul,et al.  Quantization properties of transmission parameters in linear predictive systems , 1975 .

[12]  John Makhoul,et al.  Spectral linear prediction: Properties and applications , 1975 .

[13]  John Makhoul,et al.  LPCW: An LPC vocoder with linear predictive spectral warping , 1976, ICASSP.

[14]  Shuichi Itahashi,et al.  Automatic formant extraction utilizing mel scale and equal loudness contour , 1976, ICASSP.

[15]  P. Mermelstein,et al.  Distance measures for speech recognition, psychological and instrumental , 1976 .

[16]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[17]  G. Kuhn Stop consonant place perception with single-formant stimuli: evidence for the role of the front-cavity resonance. , 1979, The Journal of the Acoustical Society of America.

[18]  Bayya Yegnanarayana,et al.  A distance measure based on the derivative of linear prediction phase spectrum , 1979, ICASSP.

[19]  H. Strube Linear prediction on a warped frequency scale , 1980 .

[20]  B. Lindblom,et al.  Modeling the judgment of vowel quality differences. , 1981, The Journal of the Acoustical Society of America.

[21]  P. Ladefoged,et al.  A further test of a two‐formant model , 1982 .

[22]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[23]  Anthony Bladon Two-formant models of vowel perception: Shortcomings and enhancement , 1983, Speech Commun..

[24]  Kuldip K. Paliwal,et al.  A study of two-formant models for vowel identification , 1983, Speech Commun..

[25]  Hynek Hermansky,et al.  Spectral envelope sampling and interpolation in linear predictive analysis of speech , 1984, ICASSP.

[26]  C. Kamm,et al.  Relationship between LP-residual spectral distances and phonetic judgments , 1985 .

[27]  Hynek Hermansky,et al.  Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain , 1985, Speech Commun..

[28]  L. A. Chistovich Central auditory processing of peripheral vowel spectra. , 1985, The Journal of the Acoustical Society of America.

[29]  H. Hermansky,et al.  An efficient speaker-independent automatic speech recognition by simulation of some properties of human auditory perception , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  S. Shamma,et al.  The Acoustic Features of Speech Phonemes in a Model of Auditory Processing: Vowels and Unvoiced Fricatives. , 1987 .

[31]  Jean-Claude Junqua,et al.  Evaluation of ASR front ends in speaker-dependent and speaker-independent recognition , 1987 .

[32]  Ted H. Applebaum,et al.  Weighted cepstral distance measures in vector quantization based speech recognizers , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  H. Hermansky,et al.  Optimization of perceptually-based ASR front-end (automatic speech recognition) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[34]  T. Hirahara On the role of the fundamental frequency in vowel perception , 1988 .

[35]  Hynek Hermansky,et al.  OPTIMIZATION OF PERCEPTUALLY-BASED ASR FRONT , 1988 .

[36]  J.S.D. Mason,et al.  Perceptually-based features in ASR , 1988 .

[37]  H. Hermansky,et al.  The front‐cavity/F2′ hypothesis tested by data on tongue movements , 1989 .

[38]  Hynek Hermansky,et al.  The effective second formant F2' and the vocal tract front-cavity , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[39]  Amro El-Jaroudi,et al.  Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..