Source enhanced linear prediction of speech incorporating simultaneously masked spectral weighting

Linear prediction is the cornerstone of most modern speech compression algorithms. This paper proposes modifying the calculation of the linear predictor coefficients to incorporate a weighting function based on the simultaneous masking property of the ear. The resultant prediction filter better models the perceptual characteristics of the source and results in the removal of more perceptually important information from the input speech signal than a standard LP filter. When employed in a low rate speech codec the net effect is an improvement in subjective quality, with no increase in transmission rate and only a modest increase in computational complexity.

[1]  B. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1979 .

[2]  Ian Burnett,et al.  Exploiting simultaneously masked linear prediction in a WI speech coder , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[3]  J. Makhoul,et al.  Linear Prediction and the Spectral Analysis of Speech , 1972 .

[4]  W. H. Holmes,et al.  PERCELP - Perceptually Enhanced Random Codebook Excited Linear Prediction , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[5]  W. Bastiaan Kleijn,et al.  A speech coder based on decomposition of characteristic waveforms , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Ed F. Deprettere,et al.  A class of analysis-by-synthesis predictive coders for high quality speech coding at rates between 4.8 and 16 kbit/s , 1988, IEEE J. Sel. Areas Commun..

[7]  I. S. Burnett Hybrid techniques for speech coding , 1992 .

[8]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[9]  Hiroshi Matsumoto,et al.  Low bit rate coding for speech and audio using mel linear predictive coding (MLPC) analysis , 1998, ICSLP.

[10]  H. Strube Linear prediction on a warped frequency scale , 1980 .

[11]  Jan Skoglund,et al.  On time-frequency masking in voiced speech , 2000, IEEE Trans. Speech Audio Process..

[12]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[13]  Bishnu S. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1978, ICASSP.

[14]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[15]  K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1990 .

[16]  Joe F. Chicharo,et al.  Linear prediction incorporating simultaneous masking , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[17]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[18]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[19]  G. S. Kang,et al.  Low-Bit Rate Speech Encoders Based on Line-Spectrum Frequencies (LSFs) , 1985 .

[20]  Allen Gersho,et al.  Encoding of LPC spectral parameters using switched-adaptive interframe vector prediction (speech coding) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[21]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.