A review of algorithms for perceptual coding of digital audio signals

Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Then, we review methodologies which achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms which manipulate transform components and subband signal decompositions. The discussion concentrates on architectures and applications of those techniques which utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms which have become international and/or commercial standards are also presented, including the ISO/MPEG family and the Dolby AC-3 algorithms. The paper concludes with a brief discussion of future research directions.

[1]  Gerhard Stoll,et al.  Bitrate Reduction of High Quality Audio Signals by Modeling the Ears Masking Thresholds , 1990 .

[2]  J. Princen The design of nonuniform modulated filter banks , 1994, Proceedings of IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis.

[3]  J. D. Johnston Estimation of perceptual entropy using noise masking criteria , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4]  Deepen Sinha,et al.  Low bit rate transparent audio compression using adapted wavelets , 1993, IEEE Trans. Signal Process..

[5]  Karlheinz Brandenburg,et al.  Second Generation Perceptual Audio Coding: The Hybrid Code , 1990 .

[6]  Ahmed H. Tewfik,et al.  Low bit rate high quality audio coding with combined harmonic and wavelet representations , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[8]  Michael A. Gerzon,et al.  Lossless Coding for Audio Discs , 1996 .

[9]  Karlheinz Brandenburg,et al.  A Two- or Three-Stage Bit-Rate Scalable Audio Coding System , 1995 .

[10]  Pierre Duhamel,et al.  A fast algorithm for the implementation of filter banks based on 'time domain aliasing cancellation' , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Elizabeth H. Prodromou,et al.  Direct broadcast satellite , 1990 .

[12]  Louis Dunn Fielder,et al.  AC-3: Flexible Perceptual Coding for Audio Transmission and Storage , 1994 .

[13]  John Mourjopoulos,et al.  A differential perceptual audio coding method with reduced bitrate requirements , 1995, IEEE Trans. Speech Audio Process..

[14]  Deepen Sinha,et al.  Low bit rate transparent audio compression using a dynamic dictionary and optimized wavelets , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Jerry D. Gibson,et al.  Digital coding of waveforms: Principles and applications to speech and video , 1985, Proceedings of the IEEE.

[16]  Mohamed A. Deriche,et al.  High quality audio coding using multipulse LPC and wavelet decomposition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Takehiro Moriya,et al.  Extension and complexity reduction of TwinVQ audio coder , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[18]  John Princen,et al.  Analysis/Synthesis filter bank design based on time domain aliasing cancellation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[19]  John Watkinson Mpeg 2 , 1999 .

[20]  Mark B. Sandler,et al.  On the performance of wavelets for low bit rate coding of audio signals , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Gerhard Stoll,et al.  ISO-MPEG-1 Audio: A Generic Standard for Coding of High-: Quality Digital Audio , 1994 .

[22]  L. M. van de Kerkhof,et al.  Scalability in MPEG Audio Compression: From Stereo via 5.1-Channel Surround Sound to 7.1-Channel Augmented Sound Fields , 1996 .

[23]  Karlheinz Brandenburg,et al.  High Quality Sound Coding at 2.5 Bit/Sample , 1988 .

[24]  W. Voessing,et al.  High Quality Digital Audio Encoding with 3.0 Bits/Sample Using Adaptive Transform Coding , 1986 .

[25]  Takehiro Moriya,et al.  Error-Protected TwinVQ Audio Coding at Less Than 64 kbit/s/ch , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[26]  Robert J. Safranek,et al.  Signal compression based on models of human perception , 1993, Proc. IEEE.

[27]  Karlheinz Brandenburg OCF--A new coding algorithm for high quality sound signals , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Gerhard Stoll,et al.  Generic Architecture of the ISO/MPEG Layer I and II: Compatible Developments to Improve the Quality and Addition of New Features , 1993 .

[29]  Bernd Edler,et al.  Tests on MPEG-4 audio codec proposals , 1997, Signal Process. Image Commun..

[30]  Takehiro Moriya,et al.  High-quality audio-coding at less than 64 kbit/s by using transform-domain weighted interleave vector quantization (TwinVQ) , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[31]  Fernando Pereira,et al.  MPEG-4: Context and objectives , 1997, Signal Process. Image Commun..

[32]  J. P. Petit,et al.  Sub-band ADPCM coding for high quality audio signals , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[33]  Marcel Breeuwer,et al.  Subband coding of digital audio signals without loss of quality , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[34]  Mohamed A. Deriche,et al.  Audio coding using the wavelet packet transform and a combined scalar-vector quantization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[35]  T. Yoshida,et al.  The rewritable MiniDisc system , 1994, Proc. IEEE.

[36]  P. Noll,et al.  Digital audio coding for visual communications , 1995, Proc. IEEE.

[37]  Fred Wylie,et al.  Predictive or Perceptual Coding-apt-X and apt-Q , 1996 .

[38]  Soo-Ngee Koh,et al.  Subband coding of high-fidelity quality audio signals at 128 kbps , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  Jean-Pierre Petit,et al.  High-quality audio transform coding at 64 kbps , 1994, IEEE Trans. Commun..

[40]  Yair Shoham,et al.  Coding of wideband speech , 1991, Speech Commun..

[41]  Akihiko Sugiyama,et al.  Adaptive transform coding with an adaptive block size (ATC-ABS) , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[42]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[43]  E. Zwicker,et al.  Audio engineering and psychoacoustics: matching signals to the final receiver, the human auditory system , 1991 .

[44]  D. Thomson,et al.  Spectrum estimation and harmonic analysis , 1982, Proceedings of the IEEE.

[45]  Marcus Purat,et al.  Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[46]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[47]  R. K. Jurgen Broadcasting with digital audio , 1996 .

[48]  P. Urcun,et al.  A MUSICAM source codec for digital audio broadcasting and storage , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[49]  Günther Theile,et al.  Low-Bit Rate Coding of High Quality Audio Signals , 1987 .

[50]  Deepen Sinha,et al.  Audio compression at low bit rates using a signal adaptive switched filterbank , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[51]  J. D. Johnston Perceptual transform coding of wideband stereo signals , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[52]  Laurent Mainard,et al.  A bi-dimensional coding scheme applied to audio bitrate reduction , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[53]  Gerhard Stoll A Perceptual-Coding Technique Offering the Best Compromise between Quality, Bit Rate, and Complexity for DSB , 1993 .

[54]  A. Hoogendoorn,et al.  Digital compact cassette , 1994, Proc. IEEE.

[55]  Allen Gersho,et al.  Constrained-storage quantization of multiple vector sources by codebook sharing , 1991, IEEE Trans. Commun..

[56]  Ernst Terhardt,et al.  Calculating virtual pitch , 1979, Hearing Research.

[57]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[58]  Karlheinz Brandenburg,et al.  OCF: Coding High Quality Audio with Data Rates of 64 kbit/sec , 1988 .

[59]  Robert Friedrich,et al.  Audio Compression for Network Transmission , 1996 .

[60]  R. Hellman Asymmetry of masking between noise and tone , 1972 .

[61]  J. D. Johnston,et al.  Sum-difference stereo transform coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Peter Monta,et al.  Low rate audio coder with hierarchical filterbanks and lattice vector quantization , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[63]  Gerhard Stoll,et al.  Extension of ISO/MPEG-Audio Layer II to Multichannel Coding: The Future Standard for Broadcasting, Telecommunication, and Multimedia Applications , 1993 .

[64]  N.S. Jayant High-quality coding of telephone speech and wideband audio , 1990, IEEE Communications Magazine.

[65]  Jean-Bernard Rault,et al.  MUSICAM (ISO-MPEG Audio) Very Low Bit-Rate Coding at a Reduced Sampling Frequency , 1993 .

[66]  N. Jayant,et al.  Digital Coding of Waveforms: Principles and Applications to Speech and Video , 1990 .

[67]  Daniel Schulz Improving audio codecs by noise substitution , 1996 .

[68]  Jozef J. Zwislocki,et al.  Analysis of Some Auditory Characteristics. , 1963 .

[69]  John Princen,et al.  Audio coding with signal adaptive filterbanks , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[70]  Y. Mahieux,et al.  High quality audio transform coding at 64 kbit/s , 1994 .

[71]  Kenneth James Gundry,et al.  A Digital Audio System for Broadcast and Prerecorded Media , 1984 .

[72]  Mark Black,et al.  Computationally efficient wavelet packet coding of wide-band stereo audio signals , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[73]  P. Noll,et al.  A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[74]  Y. Mahieux,et al.  Transform coding of audio signals using correlation between successive transform blocks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[75]  G.C.P. Lokhoff Precision adaptive subband coding (PASC) for the digital compact cassette (DCC) , 1992 .

[76]  Akihiko Sugiyama,et al.  A 128 kb/s Hi-Fi Audio CODEC Based on Adaptive Transform Coding with Adaptive Block Size MDCT , 1992, IEEE J. Sel. Areas Commun..

[77]  P. Noll,et al.  Wideband speech and audio coding , 1993, IEEE Communications Magazine.

[78]  D. D. Greenwood Critical Bandwidth and the Frequency Coordinates of the Basilar Membrane , 1961 .

[79]  Allen Gersho,et al.  High fidelity audio transform coding with vector quantization , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[80]  Ernst F Schroeder,et al.  Aspec-Adaptive Spectral Entropy Coding of High Quality Music Signals , 1991 .

[81]  Bernd Edler Current Status of the MPEG-4 Audio Verification Model Development , 1996 .

[82]  Karlheinz Brandenbrg,et al.  First Ideas on Scalable Audio Coding , 1994 .

[83]  Martin Link,et al.  Masking-Pattern Adapted Subband Coding: Use of the Dynamic Bit-Rate Margin , 1988 .

[84]  Ernst Eberlein,et al.  Improved MPEG-2 Audio Multi-Channel Encoding , 1994 .

[85]  Allen Gersho,et al.  Constrained-storage vector quantization in high fidelity audio transform coding , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[86]  P. Voros High-quality sound coding within 2*64 kbit/s using instantaneous dynamic bit-allocation , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[87]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[88]  M.B. Sandler,et al.  On the compression obtainable with four-tap wavelets , 1996, IEEE Signal Processing Letters.

[89]  Ahmed H. Tewfik,et al.  Enhanced wavelet based audio coder , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[90]  ten Kate,et al.  Compatibility Matrixing of Multichannel Bit-Rate-Reduced Audio Signals , 1996 .

[91]  Y. Mahieux,et al.  Transform coding of audio signals at 64 kbit/s , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[92]  N.S. Jayant High quality coding of telephone speech and wideband audio , 1990, IEEE International Conference on Communications, Including Supercomm Technical Sessions.

[93]  Mark Sandler,et al.  Wavelets, Regularity, Complexity, and MPEG-Audio , 1995 .