Autoregressive Modelling of Hilbert Envelopes for Wide-band Audio Coding

Frequency Domain Linear Prediction (FDLP) represents the technique for approximating temporal envelopes of a signal using autoregressive models. In this paper, we propose a wide-band audio coding system exploiting FDLP. Specifically, FDLP is applied on critically sampled sub-bands to model the Hilbert envelopes. The residual of the linear prediction forms the Hilbert carrier, which is transmitted along with the envelope parameters. This process is reversed at the decoder to reconstruct the signal. In the objective and subjective quality evaluations, the FDLP based audio codec at $66$ kbps provides competitive results compared to the state-of-art codecs at similar bit-rates.

[1]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[3]  T. Houtgast,et al.  Predicting speech intelligibility in rooms from the modulation transfer function, I. General room acoustics , 1980 .

[4]  W. Bastiaan Kleijn,et al.  Noise suppression based on extending a speech-dominated modulation band , 2007, INTERSPEECH.

[5]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[6]  Les E. Atlas,et al.  Scalable and progressive audio codec , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  James David Johnston,et al.  Enhancing the Performance of Perceptual Audio Coders by Using Temporal Noise Shaping (TNS) , 1996 .

[8]  R. Kumaresan,et al.  Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications , 1999 .

[9]  Daniel P. W. Ellis,et al.  Frequency-domain linear prediction for temporal features , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[10]  Hynek Hermansky,et al.  Scalable Wide-band Audio Codec based on Frequency Domain Linear Prediction , 2007 .

[11]  Ramdas Kumaresan An inverse signal approach to computing the envelope of a real valued signal , 1998, IEEE Signal Processing Letters.

[12]  A. Nuttall,et al.  On the quadrature approximation to the Hilbert transform of modulated signals , 1966 .

[13]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[14]  Ramdas Kumaresan An inverse signal approach to computing the envelope of a real valued signal , 1998, IEEE Signal Process. Lett..

[15]  Louis Dunn Fielder,et al.  ISO/IEC MPEG-2 Advanced Audio Coding , 1997 .

[16]  Jr. S. Marple,et al.  Computing the discrete-time 'analytic' signal via FFT , 1999, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[17]  W. Jesteadt,et al.  Forward masking as a function of frequency, masker level, and signal delay. , 1982, The Journal of the Acoustical Society of America.

[18]  Kristofer Kjörling,et al.  Spectral Band Replication, a Novel Approach in Audio Coding , 2002 .

[19]  Petr Motlícek,et al.  Temporal masking for bit-rate reduction in audio codec based on Frequency Domain Linear Prediction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Petr Motlícek,et al.  Wide-Band Audio Coding Based on Frequency-Domain Linear Prediction , 2010, EURASIP J. Audio Speech Music. Process..

[21]  Petr Motlícek,et al.  Frequency Domain Linear Prediction for QMF Sub-bands and Applications to Audio Coding , 2007, MLMI.