A Fractal Self-Similarity Model for the Spectral Representation of Audio Signals

In the application of conventional audio compression algorithms to low bit rate audio coding one is faced with the unsatisfactory tradeoff between coarser quantization and audio bandwidth reduction. Frequency Extension has therefore emerged as an important tool for the satisfactory performance of low bit rate audio codecs. In this paper we describe one of a newer class of Frequency Extension techniques which are applied directly to the high frequency resolution representation of the signal (e.g., MDCT). This particular technique is based on a Fractal SelfSimilarity Model (FSSM) for the short-term frequency representation of the signal. The FSSM model, which may include multiple dilation and translation terms, has been found to be effective for a wide variety of speech and music signals and provides a compact description for long term correlation that may exist in frequency domain. The high frequency resolution of MDCT aids in accurate parameter estimation for the model, which in turn has shown promise as a Frequency Extension tool that offers a detailed and natural sounding quality at low bit rates. Structure of the FSSM model, issues related to parameter estimation, and its application to audio coding for bit rates of 8-48 kbps is discussed. Audio demos are available at http://www.atc-labs.com/fssm.

[1]  Louis Dunn Fielder,et al.  ISO/IEC MPEG-2 Advanced Audio Coding , 1997 .

[2]  Michael F. Barnsley,et al.  Fractals everywhere , 1988 .

[3]  Kenzo Akagiri,et al.  ATRAC: Adaptive Transform Acoustic Coding for MiniDisc , 1992 .

[4]  Peter Jax,et al.  On artificial bandwidth extension of telephone speech , 2003, Signal Process..

[5]  Deepen Sinha,et al.  AT&T Perceptual Audio Coding (PAC) , 1996 .

[6]  Karlheinz Brandenburg,et al.  The iso/mpeg-audio codec: A generic standard for coding of high quality digital audio , 1992 .

[7]  John Princen,et al.  Subband/Transform coding using filter bank designs based on time domain aliasing cancellation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Aníbal Ferreira,et al.  Perceptual Coding using Sinusoidal Modeling in the MDCT Domain , 2002 .

[9]  Sugato Chakravarty,et al.  Method for the subjective assessment of intermedi-ate quality levels of coding systems , 2001 .

[10]  Kristofer Kjörling,et al.  Spectral Band Replication, a Novel Approach in Audio Coding , 2002 .

[11]  Zoran Cvetkovic,et al.  Nonuniform oversampled filter banks for audio signal processing , 2003, IEEE Trans. Speech Audio Process..

[12]  Aníbal Ferreira Accurate estimation in the ODFT domain of the frequency, phase and magnitude of stationary sinusoids , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[13]  Mark Davis The AC-3 Multichannel Coder , 1993 .

[14]  Robert J. Safranek,et al.  Signal compression based on models of human perception , 1993, Proc. IEEE.

[15]  Aníbal Ferreira,et al.  Combined spectral envelope normalization and subtraction of sinusoidal components in the ODFT and MDCT frequency domains , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).