Phase vocoder done right

The phase vocoder (PV) is a widely spread technique for processing audio signals. It employs a short-time Fourier transform (STFT) analysis-modify-synthesis loop and is typically used for time-scaling of signals by means of using different time steps for STFT analysis and synthesis. The main challenge of PV used for that purpose is the correction of the STFT phase. In this paper, we introduce a novel method for phase correction based on phase gradient estimation and its integration. The method does not require explicit peak picking and tracking nor does it require detection of transients and their separate treatment. Yet, the method does not suffer from the typical phase vocoder artifacts even for extreme time stretching factors.

[1]  Meinard Müller,et al.  A Review of Time-Scale Modification of Music Signals † , 2016 .

[2]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[3]  Meinard Müller,et al.  Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation , 2014, IEEE Signal Processing Letters.

[4]  Zdenek Prusa,et al.  A Noniterative Method for Reconstruction of Phase From STFT Magnitude , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  M. Portnoff,et al.  Time-scale modification of speech based on short-time Fourier analysis , 1981 .

[6]  Axel R¨obel A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER , 2003 .

[7]  Logan Volkers,et al.  PHASE VOCODER , 2008 .

[8]  T. Dutoit,et al.  PVSOLA: A PHASE VOCODER WITH SYNCHRONIZED OVERLAP-ADD , 2011 .

[9]  Mike E. Davies,et al.  Improved Time-Scaling of Musical Audio Using Phase Locking at Transients , 2002 .

[10]  Jean Laroche,et al.  Improved phase vocoder time-scale modification of audio , 1999, IEEE Trans. Speech Audio Process..

[11]  David Moffat,et al.  Web Audio Evaluation Tool: A Browser-based Listening Test Environment , 2015 .

[12]  Eric Moulines,et al.  Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..

[13]  Jean Laroche,et al.  Phase-vocoder: about this phasiness business , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  Martin Holters,et al.  IMPROVED PVSOLA TIME-STRETCHING AND PITCH-SHIFTING FOR POLYPHONIC AUDIO , 2012 .

[15]  Meinard Müller,et al.  TSM Toolbox: MATLAB Implementations of Time-Scale Modification Algorithms , 2014, DAFx.

[16]  Zdeněk Průša,et al.  REAL-TIME SPECTROGRAM INVERSION USING PHASE GRADIENT HEAP INTEGRATION , 2016 .

[17]  Axel Röbel A SHAPE-INVARIANT PHASE VOCODER FOR SPEECH TRANSFORMATION , 2010 .

[18]  Udo Zoelzer,et al.  DAFX: Digital Audio Effects , 2011 .

[19]  Sugato Chakravarty,et al.  Method for the subjective assessment of intermedi-ate quality levels of coding systems , 2001 .

[20]  Pavel Rajmic,et al.  Towards High Quality Real-Time Signal Reconstruction from STFT Magnitude , 2017 .

[21]  Axel Röbel,et al.  Phase vocoder and beyond , 2013 .

[22]  Thomas F. Quatieri,et al.  Shape invariant time-scale and pitch modification of speech , 1992, IEEE Trans. Signal Process..

[23]  Mark Sandler,et al.  Fast implementation for non-linear time-scaling of stereo signals , 2005 .

[24]  Frederik Nagel,et al.  A Novel Transient Handling Scheme for Time Stretching Algorithms , 2009 .

[25]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[26]  Monika Dörfler,et al.  A Phase Vocoder Based on Nonstationary Gabor Frames , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Nicki Holighaus,et al.  Non-iterative filter bank phase (re)construction , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).