Multiresolution sinusoidal model with dynamic segmentation for timescale modification of polyphonic audio signals

In this paper, we propose an efficient sinusoidal model of polyphonic audio signals especially good for the application of timescale modification. One of the critical problem of sinusoidal modeling is that the signal is smeared during the synthesis frame, which is a very undesirable effect for transient parts. We solve this problem by introducing multiresolution analysis-synthesis and dynamic segmentation methods. A signal is modeled with a sinusoidal component and a noise component. A multiresolution filter bank is applied to an input signal which splits it into octave-spaced subbands without causing aliasing and then sinusoidal analysis is applied to each subband signal. To alleviate smearing of transients during synthesis, a dynamic segmentation method is applied to the subband signals that determines the optimal analysis-synthesis frame size adaptively to fit its time-frequency characteristics. To extract sinusoidal components and calculate respective parameters, a matching pursuit algorithm is applied to each analysis frame of the subband signal. A psychoacoustic model implementing frequency masking is incorporated with matching pursuit to provide a reasonable stop condition of iteration and reduce the number of sinusoids. The noise component obtained by subtracting the synthesized signal with sinusoidal components from the original signal is modeled by a line-segment model of short time spectrum envelope. For various polyphonic audio signals, the results of simulation shows the proposed sinusoidal modeling can synthesize original signals without loss of perceptual quality and do more robust and high-quality timescale modification for large scale factors.

[1]  Mark J. T. Smith,et al.  Analysis-by-Synthesis/Overlap-Add Sinusoidal Modeling Applied to the Analysis and Synthesis of Musical Tones , 1992 .

[2]  Teresa H. Y. Meng,et al.  An analysis/synthesis tool for transient signals that allows a flexible sines+transients+noise model for audio , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  P. Depalle,et al.  Spectral Envelopes and Inverse FFT Synthesis , 1992 .

[4]  Michael M. Goodwin Multiresolution sinusoidal modeling using adaptive segmentation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[6]  Mark Dolson,et al.  The Phase Vocoder: A Tutorial , 1986 .

[7]  Michael M. Goodwin,et al.  Adaptive Signal Models , 1998 .

[8]  David V. Anderson Speech analysis and coding using a multi-resolution sinusoidal transform , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  Xavier Serra,et al.  A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[12]  Thomas F. Quatieri,et al.  Speech transformations based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[13]  Jean Laroche,et al.  Phase-vocoder: about this phasiness business , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[15]  Ting Chen,et al.  Time-scale modification of audio signals with combined harmonic and wavelet representations , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Abeer Alwan,et al.  Spectral analysis of subband filtered signals , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Julius O. Smith,et al.  Audio representations for data compression and compressed domain processing , 1998 .