Perceptual Coding of Narrowband Audio Signals

New applications such as Internet broadcast and communications, consumer multimedia products, digital AM broadcast and satellite networks are emerging. Those applications require moderate audio quality without annoying artifacts at bit rates below 16 kbit/s. Although speech coders provide high speech quality at bit rates around 8 kbit/s, they perform poorly when encoding audio signals. In this thesis, we present a novel transform coding paradigm based on the characteristics of the human hearing system. The proposed encoder, i.e., Narrowband Perceptual Audio Coder (NPAC), can accommodate a wide range of narrowband audio inputs without annoying artifacts at bit rates down to 8 kbit/s. NPAC employs a variety of algorithms to remove the perceptually irrelevant parts and statistical redundancies of the input signal. The new algorithms used in NPAC include a perceptual error measure in training the codebooks and selecting the best codewords, perceptually-based bit allocation algorithms and an adaptive predictive scheme to vector quantize the scale factors. The proposed encoder has moderate complexity and delivers good quality for narrowband audio inputs at around 1 bit/sample. Informal subjective tests have been conducted to compare the performance of NPAC with an 8 kbit/s commercially-available audio coder. The tests results show that NPAC performs better for both music and speech inputs.

[1]  K. H. Barratt Digital Coding of Waveforms , 1985 .

[2]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[3]  Andrzej Drygajlo,et al.  Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms , 1999, IEEE Trans. Signal Process..

[4]  P. Kabal,et al.  Perceptual coding of narrowband audio signals at 8 kbit/s , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[5]  Peter Monta,et al.  Low rate audio coder with hierarchical filterbanks and lattice vector quantization , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Takehiro Moriya,et al.  Scalable audio coder based on quantizer units of MDCT coefficients , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Takehiro Moriya,et al.  A design of transform coder for both speech and audio signals at 1 bit/sample , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[9]  Raymond N. J. Veldhuis,et al.  Bit Rates in Audio Source Coding , 1992, IEEE J. Sel. Areas Commun..

[10]  Jürgen Herre,et al.  Bridging the Gap: Extending MPEG Audio Down to 8 kbit/s , 1997 .

[11]  Thilo Thiede,et al.  A New Perceptual Quality Measure for Bit-Rate Reduced Audio , 1996 .

[12]  Louis Dunn Fielder,et al.  ISO/IEC MPEG-2 Advanced Audio Coding , 1997 .

[13]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[14]  Yuan-Hao Huang,et al.  A new forward masking model and its application to perceptual audio coding , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15]  Allen Gersho,et al.  Auditory distortion measure for speech coding , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Jean-Pierre Petit,et al.  High-quality audio transform coding at 64 kbps , 1994, IEEE Trans. Commun..

[17]  P. Mermelstein G.722: a new CCITT coding standard for digital transmission of wideband audio signals , 1988, IEEE Communications Magazine.

[18]  Michael G. Perkins,et al.  Application of the Princen-Bradley filter bank to speech and image compression , 1990, IEEE Trans. Acoust. Speech Signal Process..

[19]  Karlheinz Brandenburg,et al.  The iso/mpeg-audio codec: A generic standard for coding of high quality digital audio , 1992 .

[20]  Thomas P. Barnwell,et al.  The design of perfect reconstruction nonuniform band filter banks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Tor A. Ramstad,et al.  Fully vector-quantized subband coding with adaptive codebook allocation , 1984, ICASSP.

[22]  E. Terhardt,et al.  Algorithm for extraction of pitch and pitch salience from complex tonal signals , 1982 .

[23]  Julius O. Smith,et al.  Audio representations for data compression and compressed domain processing , 1998 .

[24]  Louis Dunn Fielder,et al.  AC-2 and AC-3: The Technology and Its Application , 1995 .

[25]  Jont B. Allen,et al.  Micromechanical Models of the Cochlea , 1992 .

[26]  Daniel Schulz Improving audio codecs by noise substitution , 1996 .

[27]  Henrique S. Malvar Lapped transforms for efficient transform/subband coding , 1990, IEEE Trans. Acoust. Speech Signal Process..

[28]  Deepen Sinha,et al.  Low bit rate transparent audio compression using adapted wavelets , 1993, IEEE Trans. Signal Process..

[29]  Akihiko Sugiyama,et al.  A 128 kb/s Hi-Fi Audio CODEC Based on Adaptive Transform Coding with Adaptive Block Size MDCT , 1992, IEEE J. Sel. Areas Commun..

[30]  P. Noll,et al.  Wideband speech and audio coding , 1993, IEEE Communications Magazine.

[31]  G.G. Langdon,et al.  Data compression , 1988, IEEE Potentials.

[32]  P. Noll,et al.  Adaptive transform coding of speech signals , 1977 .

[33]  T. Q. Nguyen,et al.  A simple design method for nonuniform multirate filter banks , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[34]  Schuyler R. Quackenbush Coding of natural audio in MPEG-4 , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[35]  Deepen Sinha,et al.  Audio compression at low bit rates using a signal adaptive switched filterbank , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[36]  G.C.P. Lokhoff DCC-digital compact cassette , 1991 .

[37]  Kenneth C. Pohlmann,et al.  Principles of Digital Audio , 1986 .

[38]  Allen Gersho,et al.  Constrained-storage quantization of multiple vector sources by codebook sharing , 1991, IEEE Trans. Commun..

[39]  J. D. Johnston Estimation of perceptual entropy using noise masking criteria , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[40]  Kenzo Akagiri,et al.  ATRAC: Adaptive Transform Acoustic Coding for MiniDisc , 1992 .

[41]  Rainer Dipl.-Ing. Buchta,et al.  The WorldStar- Sound Format , 1996 .

[42]  Eric D. Scheirer The MPEG-4 Structured Audio standard , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[43]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[44]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[45]  Mark J. T. Smith,et al.  Time-varying analysis-synthesis systems based on filter banks and post filtering , 1995, IEEE Trans. Signal Process..

[46]  S. Merrill Weiss MPEG Audio Coding , 1996 .

[47]  Peter Kabal,et al.  Improving perceptual coding of narrowband audio signals at low rates , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[48]  Louis Dunn Fielder,et al.  AC-2 and AC-3: Low-Complexity Transform-Based Audio Coding , 1996 .

[49]  Vladimir Cuperman On adaptive vector transform quantization for speech coding , 1989, IEEE Trans. Commun..

[50]  Gerhard Eckel,et al.  The Perception of Audio Signals Reduced by Overmasking to the Most Prominent Spectral Amplitudes (Peaks) , 1992 .

[51]  Gilbert A. Soulodre,et al.  Adaptive Methods for Removing Camera Noise from Film Soundtracks , 1998 .

[52]  Yair Shoham Vector predictive quantization of the spectral parameters for low rate speech coding , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53]  Marina Bosi,et al.  Use of Low Bit-Rate Coding for High Quality Audio Over Telephone Lines , 1992 .

[54]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[55]  Paul M. McCourt Critical band quantisation analysis for masked distortion speech coding , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[56]  Henrique S. Malvar,et al.  The LOT: transform coding without blocking effects , 1989, IEEE Trans. Acoust. Speech Signal Process..

[57]  T. Ramstad,et al.  Cosine-modulated analysis-synthesis filterbank with critical sampling and perfect reconstruction , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[58]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[59]  Marina Bosi,et al.  High-Quality, Low-Rate Audio Transform Coding for Transmission and Multimedia Applications , 1992 .

[60]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[61]  Bernd Edler Current Status of the MPEG-4 Audio Verification Model Development , 1996 .

[62]  Mark J. T. Smith,et al.  Time-domain filter bank analysis: a new design theory , 1992, IEEE Trans. Signal Process..

[63]  Ag Armin Kohlrausch,et al.  Waveform coding and auditory masking , 1995 .

[64]  Brian C. J. Moore Masking in the Human Auditory System , 1996 .

[65]  P. Jacobs,et al.  Qcelp: The North American Cdma Digital Cellular Variable Rate Speech Coding Standard , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[66]  James David Johnston,et al.  Enhancing the Performance of Perceptual Audio Coders by Using Temporal Noise Shaping (TNS) , 1996 .

[67]  Davis Pan,et al.  A Tutorial on MPEG/Audio Compression , 1995, IEEE Multim..

[68]  Deepen Sinha,et al.  AT&T Perceptual Audio Coding (PAC) , 1996 .

[69]  Todor Cooklev,et al.  Compression of High-Quality Audio Signals, Including Recent Methods Using Wavelet Packets , 1996, Digit. Signal Process..

[70]  Jun Matsumoto,et al.  Harmonic and noise coding of LPC residuals with classified vector quantization , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[71]  E. Owens Introduction to the Psychology of Hearing , 1977 .

[72]  Francis Rumsey Putting Low-Bit-Rate Audio to Work , 1996 .

[73]  Yair Shoham,et al.  Hierarchical vector quantization of speech with dynamic codebook allocation , 1984, ICASSP.

[74]  Pierrick Philippe,et al.  Wavelet packet filterbanks for low time delay audio coding , 1999, IEEE Trans. Speech Audio Process..

[75]  E. Zwicker,et al.  Audio engineering and psychoacoustics: matching signals to the final receiver, the human auditory system , 1991 .

[76]  Marcus Purat,et al.  Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[77]  Allen Gersho,et al.  Advances in speech and audio compression , 1994, Proc. IEEE.

[78]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[79]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[80]  John Princen The design of nonuniform modulated filterbanks , 1995, IEEE Trans. Signal Process..

[81]  Eliathamby Ambikairajah,et al.  Comparison of auditory masking models for speech coding , 1997, EUROSPEECH.

[82]  H. Bastian Sensation and Perception.—I , 1869, Nature.

[83]  Thomas Sporer,et al.  Evaluating a Measurement System , 1995 .

[84]  Peter No,et al.  Digital Coding of Waveforms , 1986 .

[85]  Martin Vetterli,et al.  Perfect reconstruction FIR filter banks: some properties and factorizations , 1989, IEEE Trans. Acoust. Speech Signal Process..

[86]  Henrique S. Malvar,et al.  Signal processing with lapped transforms , 1992 .

[87]  Shing-Chow Chan The generalized lapped transform (GLT) for subband coding applications , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[88]  P. Noll,et al.  Approaches to adaptive transform speech coding at low bit rates , 1979 .

[89]  Takehiro Moriya,et al.  Extension and complexity reduction of TwinVQ audio coder , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[90]  John Princen,et al.  Analysis/Synthesis filter bank design based on time domain aliasing cancellation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[91]  K. Brandenburg Audio coding for TV and multimedia , 1995 .

[92]  James A. Storer,et al.  Data Compression , 1992, Inf. Process. Manag..

[93]  Mark B. Sandler,et al.  On the performance of wavelets for low bit rate coding of audio signals , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[94]  N. Spencer An overview of digital telephony standards , 1998 .

[95]  Bernhard Feiten,et al.  Dynamically Scalable Internet Audio Transmission , 1998 .

[96]  Mark J. T. Smith,et al.  Analysis-synthesis systems with time-varying filter bank structures , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[97]  Akihiko Sugiyama,et al.  Adaptive transform coding with an adaptive block size (ATC-ABS) , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[98]  P. P. Vaidyanathan,et al.  Multirate digital filters, filter banks, polyphase networks, and applications: a tutorial , 1990, Proc. IEEE.

[99]  Ronald E. Crochiere,et al.  Frequency domain coding of speech , 1979 .

[100]  P. Urcun,et al.  A MUSICAM source codec for digital audio broadcasting and storage , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[101]  Jean-Bernard Rault,et al.  A New Noise Injection Model for Audio Compression Algorithms , 1996 .

[102]  Robert Friedrich,et al.  Audio Compression for Network Transmission , 1996 .

[103]  Bernd Edler Speech coding in MPEG-4 , 1999, Int. J. Speech Technol..

[104]  S. Wada Design of nonuniform division multirate FIR filter banks , 1995 .

[105]  Bernd Edler Very Low Bit Rate Audio Coding Development , 1997 .

[106]  John Princen,et al.  Audio coding with signal adaptive filterbanks , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[107]  Henrique S. Malvar Lapped biorthogonal transforms for transform coding with reduced blocking and ringing artifacts , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[108]  Andreas Spanias,et al.  Speech coding: a tutorial review , 1994, Proc. IEEE.

[109]  Andreas Spanias,et al.  A review of algorithms for perceptual coding of digital audio signals , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[110]  Michel C. Lavoie,et al.  Subjective evaluation of state-of-the-art two-channel audio codecs , 1998 .

[111]  Bernd Edler,et al.  Object-Based Analysis/Synthesis Audio Coder for Very Low Bit Rates , 1998 .

[112]  John Princen,et al.  Subband/Transform coding using filter bank designs based on time domain aliasing cancellation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[113]  Karlheinz Brandenburg OCF--A new coding algorithm for high quality sound signals , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[114]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[115]  Takehiro Moriya,et al.  High-quality audio-coding at less than 64 kbit/s by using transform-domain weighted interleave vector quantization (TwinVQ) , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[116]  William C. Treurniet,et al.  Objective Perceptual Measurement of Audio Quality , 1996 .

[117]  Ricardo L. de Queiroz,et al.  Time-varying lapped transforms and wavelet packets , 1993, IEEE Trans. Signal Process..

[118]  Redwan Salami,et al.  GSM enhanced full rate speech codec , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[119]  Bob Novorita,et al.  Incorporation of temporal masking effects into bark spectral distortion measure , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[120]  John G. Beerends,et al.  A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation , 1992 .

[121]  Jean-Pierre Adoul,et al.  Enhanced full rate speech codec for IS-136 digital cellular system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.