论文信息 - Probabilistic phase vocoder and its application to interpolation of missing values in audio signals

Probabilistic phase vocoder and its application to interpolation of missing values in audio signals

We formulate the phase vocoder - an audio synthesis method very closely related to inverse short time Fourier Transform synthesis - as a Gaussian state space model and demonstrate simulation results on interpolation of missing values. The audio signal is modelled as a superposition of quasi-sinusoidal signals generated by a linear dynamical system. The advantage of our “generative” perspective is that it allows a full Bayesian treatment of the problem; e.g. one can perform the analysis while arbitrary chunks of sample values are missing or model parameters are unknown. To perform audio restoration, we derive an expectation-maximisation (EM) algorithm that infers the expectations of missing samples and maximum a-posteriori model parameters. We demonstrate the validity of our approach on a set of challenging real audio examples and compare to existing methods.

Simon J. Godsill | Ali Taylan Cemgil | S. Godsill | A. Cemgil

[1] Andrew Harvey,et al. Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[3] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.

[4] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[5] Wim Wiegerinck,et al. Variational Approximations between Mean Field Theory and the Junction Tree Algorithm , 2000, UAI.

[6] Charles M. Bishop,et al. Variational Message Passing , 2005, J. Mach. Learn. Res..

[7] David Barber,et al. A generative model for music transcription , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8] Zoubin Ghahramani,et al. Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[9] David Barber,et al. Tractable Variational Structures for Approximating Graphical Models , 1998, NIPS.

[10] Paulo A. A. Esquef,et al. Interpolation of Long Gaps in Audio Signals Using Line Spectrum Pair Polynomials , 2004 .

[11] Jean Laroche,et al. Improved phase vocoder time-scale modification of audio , 1999, IEEE Trans. Speech Audio Process..

[12] R. Shumway,et al. AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[13] Mark Dolson,et al. The Phase Vocoder: A Tutorial , 1986 .

[14] W. Etter,et al. Restoration of a discrete-time signal segment by interpolation based on the left-sided and right-sided autoregressive parameters , 1996, IEEE Trans. Signal Process..

[15] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[16] Masahito Yamada,et al. Structural Time Series Models and the Kalman Filter , 1989 .

[17] J. L. Flanagan,et al. PHASE VOCODER , 2008 .

[18] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.