A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids

The problem of modeling a signal segment as a sum of exponentially damped sinusoidal components arises in many different application areas, including speech and audio processing. Often, model parameters are estimated using subspace based techniques which arrange the input signal in a structured matrix and exploit the so-called shift-invariance property related to certain vector spaces of the input matrix. A problem with this class of estimation algorithms, when used for speech and audio processing, is that the perceptual importance of the sinusoidal components is not taken into account. In this work we propose a solution to this problem. In particular, we show how to combine well-known subspace based estimation techniques with a recently developed perceptual distortion measure, in order to obtain an algorithm for extracting perceptually relevant model components. In analysis-synthesis experiments with wideband audio signals, objective and subjective evaluations show that the proposed algorithm improves perceived signal quality considerable over traditional subspace based analysis methods.

[1]  Thomas F. Quatieri,et al.  Magnitude-only reconstruction using a sinusoidal speech modelMagnitude-only reconstruction using a sinusoidal speech model , 1984, ICASSP.

[2]  Luís B. Almeida,et al.  Harmonic coding: A low bit-rate, good-quality speech coding technique , 1982, ICASSP.

[3]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[4]  Patrick Wambacq,et al.  Psycho-acoustic modeling of audio with exponentially damped sinusoids , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Ahmed H. Tewfik,et al.  Low bit rate high quality audio coding with combined harmonic and wavelet representations , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Thomas F. Quatieri,et al.  Speech transformations based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7]  Thomas F. Quatieri,et al.  Noise reduction using a soft-decision sine-wave vector quantizer , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Dipanjan Sen,et al.  Perceptual enhancement of CELP speech coders , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Jesper Jensen,et al.  A comparison of sinusoidal model variants for speech and audio representation , 2002, 2002 11th European Signal Processing Conference.

[10]  L. Deng Ieee Transactions on Speech and Audio Processing, Speech Trajectory Discrimination Using the Minimum Classiication Error Learning , 1997 .

[11]  Yannis Stylianou Removing linear phase mismatches in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[12]  Per Hedelin A tone oriented voice excited vocoder , 1981, ICASSP.

[13]  S. Vanhuffel,et al.  Algorithm for time-domain NMR data fitting based on total least squares , 1994 .

[14]  J. Laroche The use of the matrix pencil method for the spectrum analysis of musical signals , 1993 .

[15]  Teresa H. Y. Meng,et al.  A 6Kbps to 85Kbps scalable audio coder , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  A. Swindlehurst,et al.  Subspace-based signal analysis using singular value decomposition , 1993, Proc. IEEE.

[17]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[18]  Sabine Van Huffel,et al.  Enhanced resolution based on minimum variance estimation and exponential data modeling , 1993, Signal Process..

[19]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[20]  Rosario Drogo de Iacovo,et al.  Some experiments in perceptual masking of quantizing noise in analysis-by-synthesis speech coders , 1991, EUROSPEECH.

[21]  Richard Heusdens,et al.  A Gammatone-based Psychoacoustical Modeling Approach for Speech and Audio Coding , 2001 .

[22]  Jesper Jensen,et al.  Exponential sinusoidal modeling of transitional speech segments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[23]  Teresa H. Y. Meng,et al.  Sinusoidal modeling using frame-based perceptually weighted matching pursuits , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[24]  Mark J. T. Smith,et al.  Analysis-by-Synthesis/Overlap-Add Sinusoidal Modeling Applied to the Analysis and Synthesis of Musical Tones , 1992 .

[25]  Richard Heusdens,et al.  A new psychoacoustical masking model for audio coding applications , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Sabine Van Huffel,et al.  Bandpass prefiltering for exponential data fitting with known frequency region of interest , 1996, Signal Process..

[27]  Richard Heusdens,et al.  Rate-distortion optimal sinusoidal modeling of audio and speech using psychoacoustical matching pursuits , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Gang Li,et al.  Signal representation based on instantaneous amplitude models with application to speech synthesis , 2000, IEEE Trans. Speech Audio Process..

[29]  Heiko Purnhagen,et al.  ASAC-Analysis/Synthesis Audio Codec for Very Low-Bit Rates , 1996 .

[30]  Ed F. Deprettere,et al.  Robust exponential modeling of audio signals , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[31]  Luís B. Almeida,et al.  Variable-frequency synthesis: An improved harmonic coding scheme , 1984, ICASSP.

[32]  Michael M. Goodwin,et al.  Matching pursuit with damped sinusoids , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Sabine Van Huffel,et al.  Total least squares problem - computational aspects and analysis , 1991, Frontiers in applied mathematics.

[34]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[35]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[36]  K. Arun,et al.  State-space and singular-value decomposition-based approximation methods for the harmonic retrieval problem , 1983 .

[37]  R. Vafin,et al.  Sinusoidal modeling using psychoacoustic-adaptive matching pursuits , 2002, IEEE Signal Processing Letters.

[38]  Sabine Van Huffel,et al.  Speech compression based on exact modeling and structured total least norm optimization , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[39]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .