论文信息 - Progress in LPC-based frequency-domain audio coding

Progress in LPC-based frequency-domain audio coding

This paper describes the progress in frequency-domain linear prediction coding (LPC)-based audio coding schemes. Although LPC was originally used only for time-domain speech coders, it has been applied to frequency-domain coders since the late 1980s. With the progress in associated technologies, the frequency-domain LPC-based audio coding scheme has become more promising, and it has been used in speech/audio coding standards, such as MPEG-D unified speech and audio coding and 3GPP enhanced voice services since 2010. Three of the latest investigations on the representations of LPC envelopes in frequency-domain coders are shown. These are the harmonic model, frequency-resolution warping and the Powered All-Pole Spectral Envelope, all of which are aiming at further enhancement of the coding efficiency.

Hirokazu Kameoka | Takehiro Moriya | Noboru Harada | Yutaka Kamamoto | Ryosuke Sugiura

[1] Keiichi Tokuda,et al. A wideband CELP speech coder at 16 kbit/s based on mel-generalized cepstral analysis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2] Roch Lefebvre,et al. Extended AMR-WB for high-quality audio on mobile devices , 2006, IEEE Communications Magazine.

[3] Chong-Kwan Un,et al. On Predictive Coding of Speech Signals , 1985 .

[4] Zhe Wang,et al. Overview of the EVS codec architecture , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Marc Antonini,et al. Transform Audio Coding with Arithmetic-Coded Scalar Quantization and Model-Based Bit Allocation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6] Manfred R. Schroeder,et al. Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Gerald Schuller,et al. Frequency warping in low delay audio coding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8] 鎌本優. Efficient lossless coding of multichannel signal based on time-space linear predictive model , 2012 .

[9] Ralf Geiger,et al. MDCT-based coder for highly adaptive speech and audio coding , 2009, 2009 17th European Signal Processing Conference.

[10] Takehiro Moriya,et al. Lossless Compression of Mapped Domain Linear Prediction Residual for ITU-T Recommendation G.711.0 , 2010, 2010 Data Compression Conference.

[11] S. Hayashi,et al. Design and description of CS-ACELP: a toll quality 8 kb/s speech coder , 1998, IEEE Trans. Speech Audio Process..

[12] Keiichi Tokuda,et al. Efficient encoding of mel-generalized cepstrum for CELP coders , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Sascha Disch,et al. MPEG Unified Speech and Audio Coding-The ISO/MPEG Standard for High-Efficiency Audio Coding of All C , 2012 .

[14] Solomon W. Golomb,et al. Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[15] R. W. Schafer,et al. Lossless compression of digital audio , 2001, IEEE Signal Process. Mag..

[16] B.S. Atal,et al. Efficient search procedures for selecting the optimum innovation in stochastic coders , 1990, IEEE Trans. Acoust. Speech Signal Process..

[17] Tomas Bäckström,et al. Arithmetic coding of speech and audio spectra using tcx based on linear predictive spectral envelopes , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] Hirokazu Kameoka,et al. Optimal Coding of Generalized-Gaussian-Distributed Frequency Spectra for Low-Delay Audio Coder With Powered All-Pole Spectrum Estimation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19] Giovanni Motta,et al. Handbook of Data Compression , 2009 .

[20] Bin Yu,et al. Perceptual audio coding using adaptive pre- and post-filters and lossless compression , 2002, IEEE Trans. Speech Audio Process..

[21] Susanto Rahardja,et al. A statistics study of the MDCT coefficient distribution for audio , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[22] K. Tokuda,et al. Spectral estimation of speech by mel‐generalized cepstral analysis , 1993 .

[23] Takao Kobayashi,et al. A hardware implementation of a new narrow to medium band speech coding , 1982, ICASSP.

[24] Hirokazu Kameoka,et al. A Linear Predictive Coding Algorithm Minimizing the Golomb-Rice Code Length of the Residual Signal , 2008 .

[25] Takehiro Moriya,et al. Extension and complexity reduction of TwinVQ audio coder , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26] Yuriy A. Reznik. Coding of prediction residual in MPEG-4 standard for lossless audio coding (MPEG-4 ALS) , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27] P. Noll,et al. Adaptive transform coding of speech signals , 1977 .

[28] Ken D. Sauer,et al. A generalized Gaussian image model for edge-preserving MAP estimation , 1993, IEEE Trans. Image Process..

[29] Isabel Trancoso,et al. Efficient procedures for finding the optimum innovation in stochastic coders , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30] Stéphane Mallat,et al. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[31] F. Itakura. Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[32] P. Mabilleau,et al. Fast CELP coding based on algebraic codes , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33] Hirokazu Kameoka,et al. Resolution Warped Spectral Representation for Low-Delay and Low-Bit-Rate Audio Coder , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[34] Takehiro Moriya,et al. Transform coding of speech using a weighted vector quantizer , 1988, IEEE J. Sel. Areas Commun..

[35] Roch Lefebvre,et al. High quality coding of wideband audio signals using transform coded excitation (TCX) , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[36] Takehiro Moriya,et al. The MPEG-4 Audio Lossless Coding (ALS) Standard - Technology and Applications , 2005 .

[37] Hirokazu Kameoka,et al. Representation of spectral envelope with warped frequency resolution for audio coder , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[38] Takehiro Moriya,et al. Harmonic model for MDCT based audio coding with LPC envelope , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[39] S. Golomb. Run-length encodings. , 1966 .

[40] Hirokazu Kameoka,et al. Golomb-rice coding optimized via LPC for frequency domain audio coder , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[41] B.S. Atal. The history of linear prediction , 2006, IEEE Signal Processing Magazine.

[42] Khalid Sayood. Lossless Compression Handbook , 2003 .

[43] Takehiro Moriya,et al. Emerging ITU-T standard G.711.0 — lossless compression of G.711 pulse code modulation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44] Pierre Moulin,et al. Analysis of Multiresolution Image Denoising Schemes Using Generalized Gaussian and Complexity Priors , 1999, IEEE Trans. Inf. Theory.

[45] Tilman Liebchen,et al. MPEG-4 ALS: an emerging standard for lossless audio coding , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[46] Hiroshi Matsumoto,et al. Low bit rate coding for speech and audio using mel linear predictive coding (MLPC) analysis , 1998, ICSLP.

[47] Takehiro Moriya,et al. High-quality audio coding at less than 64 kbit/s by using TwinVQ , 1995 .

[48] H. Strube. Linear prediction on a warped frequency scale , 1980 .

[49] Schuyler R. Quackenbush. MPEG Unified Speech and Audio Coding , 2013, IEEE MultiMedia.