Rate-Distortion Optimal Time-Frequency Decompositions for MDCT-based Audio Coding

We investigate the use of nonuniform cosine-modulated filter banks for audio coding. A rate-distortion framework is employed, similar to the work in [1], to select the filter bank structure from a large library of possible frequency decompositions. A new flexible frequency decomposition algorithm is proposed that jointly optimizes the filter bank structure and the bit allocation over the subband channels. Experimental results for both synthetic and real audio signals are provided. The new algorithm shows significant improvements in comparison with fixed uniform frequency decompositions, but special care has to be taken to reduce the size of the decomposition overhead.

[1]  Seymour Shlien,et al.  The modulated lapped transform, its time-varying forms, and its applications to audio coding standards , 1997, IEEE Trans. Speech Audio Process..

[2]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[3]  Henrique S. Malvar,et al.  Fast algorithms for orthogonal and biorthogonal modulated lapped transforms , 1998, 1998 IEEE Symposium on Advances in Digital Filtering and Signal Processing. Symposium Proceedings (Cat. No.98EX185).

[4]  Antonio Ortega,et al.  Bit allocation for dependent quantization with applications to multiresolution and MPEG video coders , 1994, IEEE Trans. Image Process..

[5]  W. Bastiaan Kleijn,et al.  Rate-distortion optimized quantization in multistage audio coding , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  C. K. Yuen,et al.  Digital Filters , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Chi-Min Liu,et al.  The Efficient Temporal Noise Shaping Method , 2004 .

[8]  M.G. Christensen,et al.  Low complexity rate-distortion optimized time-segmentation for audio coding , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[9]  G. Matviyenko Optimized Local Trigonometric Bases , 1996 .

[10]  Zixiang Xiong,et al.  Scalable audio coding using the nonuniform modulated complex lapped transform , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Martin Link An Attack Processing of Audio Signals for Optimizing the Temporal Characteristics of a Low Bit-Rate Audio Coding System , 1993 .

[12]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[13]  J. D. Johnston,et al.  Continuously signal-adaptive filterbank for high-quality perceptual audio coding , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[15]  Kenneth Rose,et al.  A trellis-based optimal parameter value selection for audio coding , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Louis Dunn Fielder,et al.  ISO/IEC MPEG-2 Advanced Audio Coding , 1997 .

[17]  Bernd Edler Codierung von Audiosignalen mit überlappender Transformation und adaptiven Fensterfunktionen , 1989 .

[18]  David L. Neuhoff,et al.  Bennett's integral for vector quantizers , 1995, IEEE Trans. Inf. Theory.

[19]  Henrique S. Malvar The LOT: a link between block transform coding and multirate filter banks , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[20]  Henrique S. Malvar Biorthogonal and nonuniform lapped transforms for transform coding with reduced blocking and ringing artifacts , 1998, IEEE Trans. Signal Process..

[21]  Henrique S. Malvar Modulated QMF filter banks with perfect reconstruction , 1990 .

[22]  Mark B. Sandler,et al.  MDCT analysis of sinusoids: exact results and applications to coding artifacts reduction , 2004, IEEE Transactions on Speech and Audio Processing.

[23]  Jürgen Herre,et al.  Temporal Noise Shaping, Qualtization and Coding Methods in Perceptual Audio Coding: A Tutorial Introduction , 1999 .

[24]  Kenneth Rose,et al.  A conditional enhancement-layer quantizer for the scalable MPEG advanced Audio Coder , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  K.R. Rao,et al.  An efficient implementation of the forward and inverse MDCT in MPEG audio coding , 2001, IEEE Signal Processing Letters.

[26]  Jesper Jensen,et al.  Adaptive time-segmentation for speech coding with limited delay , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Guangyu Wang,et al.  Time-varying MMSE modulated lapped transform and its applications to transform coding for speech and audio signals , 2002, Signal Process..

[28]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[29]  Richard Heusdens,et al.  RD optimal time segmentations for the time-varying MDCT , 2004, 2004 12th European Signal Processing Conference.

[30]  Mark Sandler,et al.  MDCT Analysis of Sinusoids and Applications to Coding Artifacts Reduction. , 2003 .

[31]  Ronald R. Coifman,et al.  Signal processing and compression with wavelet packets , 1994 .

[32]  Frank Baumgarte,et al.  Improved audio coding using a psychoacoustic model based on a cochlear filter bank , 2002, IEEE Trans. Speech Audio Process..

[33]  Oliver Read,et al.  From Tin Foil to Stereo , 1960 .

[34]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[36]  P. Noll,et al.  A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[37]  Eric Allamanche,et al.  MPEG-4 Low Delay Audio Coding Based on the AAC Codec , 1999 .

[38]  A. Papoulis,et al.  The Fourier Integral and Its Applications , 1963 .

[39]  Marina Bosi,et al.  High-Quality, Low-Rate Audio Transform Coding for Transmission and Multimedia Applications , 1992 .

[40]  Ramesh A. Gopinath,et al.  Theory of modulated filter banks and modulated wavelet tight frames , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Charles A. Bouman,et al.  Time-frequency analysis with best local cosine bases , 2004, IS&T/SPIE Electronic Imaging.

[42]  Sean A. Ramprashad The multimode transform predictive coding paradigm , 2003, IEEE Trans. Speech Audio Process..

[43]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[44]  Charles A. Bouman,et al.  New algorithms for best local cosine basis search , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[45]  Pierre Duhamel,et al.  A fast algorithm for the implementation of filter banks based on 'time domain aliasing cancellation' , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[46]  Kristofer Kjörling,et al.  Spectral Band Replication, a Novel Approach in Audio Coding , 2002 .

[47]  Roch Lefebvre,et al.  Universal speech/audio coding using hybrid ACELP/TCX techniques , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[48]  Henrique S. Malvar Extended lapped transforms: fast algorithms and applications , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[49]  Antonio Ortega,et al.  Optimal buffer-constrained source quantization and fast approximations , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[50]  R. Bellman Dynamic programming. , 1957, Science.

[51]  Carl Taswell,et al.  Empirical Tests for Evaluation of Multirate Filter Bank Parameters , 2001 .

[52]  Henrique S. Malvar Lapped transforms for efficient transform/subband coding , 1990, IEEE Trans. Acoust. Speech Signal Process..

[53]  Jesper Jensen,et al.  A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration , 2005, EURASIP J. Adv. Signal Process..

[54]  James David Johnston,et al.  Exploiting Both Time and Frequency Structure in a System That Uses an Analysis/Synthesis Filterbank with High Frequency Resolution , 1997 .

[55]  Kenneth Rose,et al.  Approaches to Improve Quantization Performance Over the Scalable Advanced Audio Coder , 2002 .

[56]  Dennis Gabor,et al.  Theory of communication , 1946 .

[57]  Chi-Wah Kok,et al.  Fast algorithm for computing discrete cosine transform , 1997, IEEE Trans. Signal Process..

[58]  K. W. Cattermole The Fourier Transform and its Applications , 1965 .

[59]  Antonio Ortega,et al.  Optimal trellis-based buffered compression and fast approximations , 1994, IEEE Trans. Image Process..

[60]  Jae S. Lim,et al.  Incorporation of biorthogonality into lapped transforms for audio compression , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[61]  N. Levinson The Wiener (Root Mean Square) Error Criterion in Filter Design and Prediction , 1946 .

[62]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[63]  Miikka Vilermo,et al.  Modified Discrete Cosine Transform: Its Implications for Audio Coding and Error Concealment , 2003 .

[64]  Harvey J. Everett Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources , 1963 .

[65]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[66]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[67]  R. Heusdens,et al.  Subband merging in cosine-modulated filter banks , 2003, IEEE Signal Processing Letters.

[68]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[69]  Claus Bauer,et al.  Joint optimization of scale factors and Huffman code books for MPEG-4 AAC , 2006, IEEE Transactions on Signal Processing.

[70]  P. P. Vaidyanathan,et al.  Cosine-modulated FIR filter banks satisfying perfect reconstruction , 1992, IEEE Trans. Signal Process..

[71]  Alan B. Bradley,et al.  Filter bank design based on time domain aliasing cancellation with non-identical windows , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[72]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[73]  Werner Oomen,et al.  Parametric Coding for High-Quality Audio , 2002 .

[74]  Henrique S. Malvar A modulated complex lapped transform and its applications to audio processing , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[75]  Henrique S. Malvar,et al.  Signal processing with lapped transforms , 1992 .

[76]  Wei Ding,et al.  Rate control of MPEG video coding and recording by rate-quantization modeling , 1996, IEEE Trans. Circuits Syst. Video Technol..

[77]  Miikka Vilermo,et al.  Energy Compaction Property of the MDCT in Comparison with Other Transforms , 2000 .

[78]  Kannan Ramchandran,et al.  Tilings of the time-frequency plane: construction of arbitrary orthogonal bases and fast tiling algorithms , 1993, IEEE Trans. Signal Process..

[79]  D. Sevic,et al.  A new efficient implementation of the oddly stacked Princen-Bradley filter bank , 1994, IEEE Signal Processing Letters.

[80]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[81]  L. Yaroslavsky,et al.  On the relationship between MDCT, SDPT and DFT , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[82]  Friedrich K. Engel Magnetic Tape-From the Early Days to the Present , 1988 .

[83]  Richard Heusdens,et al.  Rate-distortion optimal sinusoidal modeling of audio and speech using psychoacoustical matching pursuits , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[84]  S. L. Regunathan,et al.  Near-optimal selection of encoding parameters for audio coding , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[85]  S.L.J.D.E. van de Par,et al.  Rate-distortion optimized hybrid sound coding , 2005 .

[86]  John Princen,et al.  Subband/Transform coding using filter bank designs based on time domain aliasing cancellation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[87]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[88]  Per Ekstrand BANDWIDTH EXTENSION OF AUDIO SIGNALS BY SPECTRAL BAND REPLICATION , 2002 .

[89]  William C. Treurniet,et al.  Objective Perceptual Measurement of Audio Quality , 1996 .

[90]  M. Vetterli,et al.  Time-varying modulated lapped transforms , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[91]  K Ramchandran,et al.  Best wavelet packet bases in a rate-distortion sense , 1993, IEEE Trans. Image Process..

[92]  Richard Heusdens,et al.  Optimal time segmentation for overlap-add systems with variable amount of window overlap , 2005, IEEE Signal Processing Letters.

[93]  P. Lafrance,et al.  Digital filters , 1974, Proceedings of the IEEE.

[94]  Jelena Kovacevic,et al.  Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[95]  R. Heusdens,et al.  Flexible frequency decompositions for cosine-modulated filter banks , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[96]  Antonio Ortega,et al.  Bit-rate control using piecewise approximated rate-distortion characteristics , 1998, IEEE Trans. Circuits Syst. Video Technol..

[97]  Michael T. Orchard,et al.  Flexible tree-structured signal expansions using time-varying wavelet packets , 1997, IEEE Trans. Signal Process..

[98]  Henrique S. Malvar Extended lapped transforms: properties, applications, and fast algorithms , 1992, IEEE Trans. Signal Process..

[99]  P. P. Vaidyanathan,et al.  New results on cosine-modulated FIR filter banks satisfying perfect reconstruction , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[100]  P. Vaidyanathan Multirate Systems And Filter Banks , 1992 .

[101]  H. S. Malvar Efficient signal coding with hierarchical lapped transforms , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[102]  James David Johnston,et al.  Enhancing the Performance of Perceptual Audio Coders by Using Temporal Noise Shaping (TNS) , 1996 .

[103]  Zixiang Xiong,et al.  Audio coding and image denoising based on the nonuniform modulated complex lapped transform , 2005, IEEE Transactions on Multimedia.

[104]  Henrique S. Malvar Enhancing the performance of subband audio coders for speech signals , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[105]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[106]  Philip E. Gill,et al.  Practical optimization , 1981 .

[107]  J. D. Johnston,et al.  Estimation of perceptual entropy using noise masking criteria , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[108]  Jesper Jensen,et al.  Bit-Rate Scalable Intraframe Sinusoidal Audio Coding Based on Rate-Distortion Optimization , 2006 .

[109]  Jörg Kliewer,et al.  Audio subband coding with improved representation of transient signal segments , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[110]  Pim Korten,et al.  High-Resolution Spherical Quantization of Sinusoidal Parameters , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[111]  Stéphane Mallat,et al.  On denoising and best signal representation , 1999, IEEE Trans. Inf. Theory.

[112]  David Leporini,et al.  Bayesian approach to best basis selection , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[113]  Martin Vetterli,et al.  Optimal bit allocation with side information , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[114]  S. Mallat A wavelet tour of signal processing , 1998 .

[115]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[116]  P. Prandoni Optimal segmentation techniques for piecewise stationary signals , 1999 .

[117]  Antonio Ortega,et al.  Rate-distortion methods for image and video compression , 1998, IEEE Signal Process. Mag..

[118]  Bernd Edler,et al.  Improved Quantization and Lossless Coding for Subband Audio Coding , 2005 .

[119]  James Durbin,et al.  The fitting of time series models , 1960 .

[120]  Pierre Moulin Signal estimation using adapted tree-structured bases and the MDL principle , 1996, Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96).

[121]  Paolo Prandoni,et al.  R/D optimal linear prediction , 2000, IEEE Trans. Speech Audio Process..

[122]  W. R. Bennett,et al.  Spectra of quantized signals , 1948, Bell Syst. Tech. J..

[123]  George S. Moschytz,et al.  Audio coding based on rate distortion and perceptual optimization , 2000, SPIE Defense + Commercial Sensing.

[124]  Richard Heusdens,et al.  Upfront Time Segmentation Methods for Transform Coding of Audio , 2005 .

[125]  Y. Wang,et al.  Some peculiar properties of the MDCT , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[126]  Peter No,et al.  Digital Coding of Waveforms , 1986 .

[127]  Deepen Sinha,et al.  A New Class of Smooth Power Complementary Windows and Their Application to Audio Signal Processing , 2005 .

[128]  John Princen,et al.  Analysis/Synthesis filter bank design based on time domain aliasing cancellation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[129]  Aníbal J. S. Ferreira Convolutional effects in transform coding with TDAC: an optimal window , 1996, IEEE Trans. Speech Audio Process..

[130]  Søren Holdt Jensen,et al.  On perceptual distortion minimization and nonlinear least-squares frequency estimation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.