Probabilistic phase vocoder and its application to interpolation of missing values in audio signals

We formulate the phase vocoder - an audio synthesis method very closely related to inverse short time Fourier Transform synthesis - as a Gaussian state space model and demonstrate simulation results on interpolation of missing values. The audio signal is modelled as a superposition of quasi-sinusoidal signals generated by a linear dynamical system. The advantage of our “generative” perspective is that it allows a full Bayesian treatment of the problem; e.g. one can perform the analysis while arbitrary chunks of sample values are missing or model parameters are unknown. To perform audio restoration, we derive an expectation-maximisation (EM) algorithm that infers the expectations of missing samples and maximum a-posteriori model parameters. We demonstrate the validity of our approach on a set of challenging real audio examples and compare to existing methods.

[1]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[3]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[4]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[5]  Wim Wiegerinck,et al.  Variational Approximations between Mean Field Theory and the Junction Tree Algorithm , 2000, UAI.

[6]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[7]  David Barber,et al.  A generative model for music transcription , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[9]  David Barber,et al.  Tractable Variational Structures for Approximating Graphical Models , 1998, NIPS.

[10]  Paulo A. A. Esquef,et al.  Interpolation of Long Gaps in Audio Signals Using Line Spectrum Pair Polynomials , 2004 .

[11]  Jean Laroche,et al.  Improved phase vocoder time-scale modification of audio , 1999, IEEE Trans. Speech Audio Process..

[12]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[13]  Mark Dolson,et al.  The Phase Vocoder: A Tutorial , 1986 .

[14]  W. Etter,et al.  Restoration of a discrete-time signal segment by interpolation based on the left-sided and right-sided autoregressive parameters , 1996, IEEE Trans. Signal Process..

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Masahito Yamada,et al.  Structural Time Series Models and the Kalman Filter , 1989 .

[17]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[18]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.