论文信息 - PLP2: Autoregressive modeling of auditory-like 2-D spectro-temporal patterns

PLP2: Autoregressive modeling of auditory-like 2-D spectro-temporal patterns

The temporal trajectories of the spectral energy in auditory critical bands over 250 ms segments are approximated by an all-pole model, the time-domain dual of conventional linear prediction. This quarter-second auditory spectro-temporal pattern is further smoothed by iterative alternation of spectral and temporal all-pole modeling. Just as Perceptual Linear Prediction (PLP) uses an autoregressive model in the frequency domain to estimate peaks in an auditory-like short-term spectral slice, PLP$^2$ uses all-pole modeling in both time and frequency domains to estimate peaks of a two-dimensional spectro-temporal pattern, motivated by considerations of the auditory system.

[1] Shozo Makino,et al. Recognition of consonant based on the perceptron model , 1983, ICASSP.

[2] Hynek Hermansky,et al. Analysis and synthesis of speech based on spectral transform linear predictive method , 1983, ICASSP.

[3] Peter F. Brown,et al. The acoustic-modeling problem in automatic speech recognition , 1987 .

[4] Ronald A. Cole,et al. Spoken Letter Recognition , 1990, HLT.

[5] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[6] S. Shamma,et al. Ripple Analysis in Ferret Primary Auditory Cortex. I. Response Characteristics of Single Units to Sinusoidally Rippled Spectra , 1994 .

[7] Hynek Hermansky,et al. TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[8] Daniel P. W. Ellis,et al. Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9] Daniel P. W. Ellis,et al. Feature extraction using non-linear transformation for robust speech recognition on the Aurora database , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10] Daniel P. W. Ellis,et al. Frequency-domain linear prediction for temporal features , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[11] Daniel P. W. Ellis,et al. Sound texture modelling with linear prediction in both time and frequency domains , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12] Pavel Matejka,et al. Recognition of phoneme strings using TRAP technique , 2003, INTERSPEECH.

[13] Jonathan Z. Simon,et al. Robust Spectrotemporal Reverse Correlation for the Auditory System: Optimizing Stimulus Design , 2000, Journal of Computational Neuroscience.