Perceptual audio modeling with exponentially damped sinusoids

This paper presents the derivation of a new perceptual model that represents speech and audio signals by a sum of exponentially damped sinusoids. Compared to a traditional sinusoidal model, the exponential sinusoidal model (ESM) is better suited to model transient segments that are readily found in audio signals.Total least squares (TLS) algorithms are applied for the automatic extraction of the modeling parameters in the ESM, i.e. the amplitude, phase, frequency and damping factors of a user-defined number of damped sinusoids. In order to turn the SNR optimization criterion of these TLS algorithms into a perceptual modeling strategy, we use the psychoacoustic model of MPEG-1 Layer 1 in a subband TLS-ESM scheme. This allows us to model each subband signal in accordance with its perceptual relevance, thereby lowering the number of required modeling components for a given modeling quality. Simulations and listening tests confirm that perceptual ESM achieves the same perceived quality as plain ESM while using substantially less components, and provide support for applying the new model in the fields of parametric audio processing and coding.

[1]  S. Huffel,et al.  STRUCTURED TOTAL LEAST SQUARES Analysis, Algorithms and Applications , 2002 .

[2]  Ed F. Deprettere,et al.  Robust exponential modeling of audio signals , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  B. Hofmann-Wellenhof,et al.  Introduction to spectral analysis , 1986 .

[4]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[5]  Werner Oomen,et al.  Parametric Coding for High-Quality Audio , 2002 .

[6]  Jesper Jensen,et al.  A comparison of differential schemes for low-rate sinusoidal audio coding , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[7]  Sabine Van Huffel,et al.  Formulation and solution of structured total least norm problems for parameter estimation , 1996, IEEE Trans. Signal Process..

[8]  H. Simon The Lanczos algorithm with partial reorthogonalization , 1984 .

[9]  Luís B. Almeida,et al.  Variable-frequency synthesis: An improved harmonic coding scheme , 1984, ICASSP.

[10]  Thomas F. Quatieri,et al.  Speech transformations based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  Jesper Jensen,et al.  A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids , 2004, IEEE Transactions on Speech and Audio Processing.

[12]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[13]  Curtis Roads,et al.  The Computer Music Tutorial , 1996 .

[14]  Jesper Jensen,et al.  Exponential sinusoidal modeling of transitional speech segments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[16]  Per Hedelin A tone oriented voice excited vocoder , 1981, ICASSP.

[17]  K. H. Barratt Digital Coding of Waveforms , 1985 .

[18]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[19]  Sabine Van Huffel,et al.  Total least squares problem - computational aspects and analysis , 1991, Frontiers in applied mathematics.

[20]  Richard Heusdens,et al.  Rate-distortion optimal sinusoidal modeling of audio and speech using psychoacoustical matching pursuits , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Miljko Bobrek,et al.  Music Signal Segmentation Using Tree-Structured Filter Banks , 1998 .

[22]  Sabine Van Huffel,et al.  The total least squares problem , 1993 .

[23]  D. Calvetti,et al.  AN IMPLICITLY RESTARTED LANCZOS METHOD FOR LARGE SYMMETRIC EIGENVALUE PROBLEMS , 1994 .

[24]  Bernd Edler,et al.  Speeding Up HILN-MPEG-4 Parametric Audio Encoding with Reduced Complexity , 2000 .

[25]  Jeroen Breebaart,et al.  ADVANCES IN PARAMETRIC CODING FOR HIGH-QUALITY AUDIO , 2003 .

[26]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[27]  Bernd Edler,et al.  Object-Based Analysis/Synthesis Audio Coder for Very Low Bit Rates , 1998 .

[28]  Julius O. Smith,et al.  Audio representations for data compression and compressed domain processing , 1998 .

[29]  S. Vanhuffel,et al.  Algorithm for time-domain NMR data fitting based on total least squares , 1994 .

[30]  J. Jensen Sinusoidal Models for Speech Signal Processing , 2000 .

[31]  Peter No,et al.  Digital Coding of Waveforms , 1986 .

[32]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[33]  Sabine Van Huffel,et al.  Structured Total Least Squares , 2002 .

[34]  Jesper Jensen,et al.  Harmonic exponential modeling of transitional speech segments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[35]  Patrick Wambacq,et al.  Total least squares based subband modelling for scalable speech representations with damped sinusoids , 2000, INTERSPEECH.

[36]  Jörg Kliewer,et al.  Audio subband coding with improved representation of transient signal segments , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[37]  Sabine Van Huffel,et al.  Speech compression based on exact modeling and structured total least norm optimization , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[38]  Bernd Edler,et al.  Concepts for Hybrid Audio Coding Schemes Based on Parametric Techniques , 1998 .