Informed spectral analysis: audio signal parameter estimation using side information

Parametric models are of great interest for representing and manipulating sounds. However, the quality of the resulting signals depends on the precision of the parameters. When the signals are available, these parameters can be estimated, but the presence of noise decreases the resulting precision of the estimation. Furthermore, the Cramér-Rao bound shows the minimal error reachable with the best estimator, which can be insufficient for demanding applications. These limitations can be overcome by using the coding approach which consists in directly transmitting the parameters with the best precision using the minimal bitrate. However, this approach does not take advantage of the information provided by the estimation from the signal and may require a larger bitrate and a loss of compatibility with existing file formats. The purpose of this article is to propose a compromised approach, called the 'informed approach,’ which combines analysis with (coded) side information in order to increase the precision of parameter estimation using a lower bitrate than pure coding approaches, the audio signal being known. Thus, the analysis problem is presented in a coder/decoder configuration where the side information is computed and inaudibly embedded into the mixture signal at the coder. At the decoder, the extra information is extracted and is used to assist the analysis process. This study proposes applying this approach to audio spectral analysis using sinusoidal modeling which is a well-known model with practical applications and where theoretical bounds have been calculated. This work aims at uncovering new approaches for audio quality-based applications. It provides a solution for challenging problems like active listening of music, source separation, and realistic sound transformations.

[1]  Laurent Girin,et al.  A watermarking-based method for single-channel audio source separation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Sylvain Marchand,et al.  Informed spectral analysis for isolated audio source parameters estimation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[3]  Laurent Girin,et al.  Informed source separation of underdetermined instantaneous stereo mixtures using source index embedding , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Laurent Girin,et al.  A high-capacity watermarking technique for audio signals based on MDCT-domain quantization , 2010 .

[5]  Julius O. Smith,et al.  PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation , 1987, ICMC.

[6]  Kevin H. Knuth,et al.  Informed source separation: A Bayesian tutorial , 2013, 2005 13th European Signal Processing Conference.

[7]  A. Röbel,et al.  Adaptive noise level estimation , 2006 .

[8]  A Lyons,et al.  International Computer Music Conference , 2013 .

[9]  Heiko Purnhagen,et al.  HILN-the MPEG-4 parametric audio coding tools , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[10]  Gregory W. Wornell,et al.  Quantization index modulation: A class of provably good methods for digital watermarking and information embedding , 2001, IEEE Trans. Inf. Theory.

[11]  Gaël Richard,et al.  Estimation of Frequency for AM/FM Models Using the Phase Vocoder Framework , 2008, IEEE Transactions on Signal Processing.

[12]  S. Marchand,et al.  GENERALIZATION OF THE DERIVATIVE ANALYSIS METHOD TO NON-STATIONARY SINUSOIDAL MODELING , 2008 .

[13]  Laurent Girin,et al.  Comparing the order of a polynomial phase model for the synthesis of quasi-harmonic audio signals , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[14]  Jeroen Breebaart,et al.  ADVANCES IN PARAMETRIC CODING FOR HIGH-QUALITY AUDIO , 2003 .

[15]  K. Kodera,et al.  A new method for the numerical analysis of nonstationary signals , 1976 .

[16]  Pim Korten,et al.  High-Resolution Spherical Quantization of Sinusoidal Parameters , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  R. Gray Source Coding Theory , 1989 .

[18]  Sylvain Marchand,et al.  A NEW ANALYSIS METHOD FOR SINUSOIDS+NOISE SPECTRAL MODELS , 2006 .

[19]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[20]  E. G. RICHARDSON,et al.  International Congress on Acoustics , 1959, Nature.

[21]  K. Kodera,et al.  Analysis of time-varying signals with small BT values , 1978 .

[22]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[23]  Petar M. Djuric,et al.  Parameter estimation of chirp signals , 1990, IEEE Trans. Acoust. Speech Signal Process..

[24]  Patrick Flandrin,et al.  Improving the readability of time-frequency and time-scale representations by the reassignment method , 1995, IEEE Trans. Signal Process..

[25]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[26]  Roland Badeau,et al.  High-resolution spectral analysis of mixtures of complex exponentials modulated by polynomials , 2006, IEEE Transactions on Signal Processing.

[27]  Florian Keiler,et al.  SURVEY ON EXTRACTION OF SINUSOIDS IN STATIONARY SOUNDS , 2002 .

[28]  Sylvain Marchand,et al.  BREAKING THE BOUNDS: INTRODUCING INFORMED SPECTRAL ANALYSIS , 2010 .

[29]  Mathieu Lagrange,et al.  Using linear prediction to enhance the tracking of partials [musical audio processing] , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  R. Ladner Entropy-constrained Vector Quantization , 2000 .

[31]  Ananthram Swami,et al.  On polynomial phase signals with time-varying amplitudes , 1996, IEEE Trans. Signal Process..

[32]  Teresa H. Y. Meng,et al.  A 6Kbps to 85Kbps scalable audio coder , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[33]  Ahmed H. Tewfik,et al.  Low bit rate high quality audio coding with combined harmonic and wavelet representations , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.