论文信息 - Time-Frequency Analysis as Probabilistic Inference

Time-Frequency Analysis as Probabilistic Inference

This paper proposes a new view of time-frequency analysis framed in terms of probabilistic inference. Natural signals are assumed to be formed by the superposition of distinct time-frequency components, with the analytic goal being to infer these components by application of Bayes' rule. The framework serves to unify various existing models for natural time-series; it relates to both the Wiener and Kalman filters, and with suitable assumptions yields inferential interpretations of the short-time Fourier transform, spectrogram, filter bank, and wavelet representations. Value is gained by placing time-frequency analysis on the same probabilistic basis as is often employed in applications such as denoising, source separation, or recognition. Uncertainty in the time-frequency representation can be propagated correctly to application-specific stages, improving the handing of noise and missing data. Probabilistic learning allows modules to be co-adapted; thus, the time-frequency representation can be adapted to both the demands of the application and the time-varying statistics of the signal at hand. Similarly, the application module can be adapted to fine properties of the signal propagated by the initial time-frequency processing. We demonstrate these benefits by combining probabilistic time-frequency representations with non-negative matrix factorization, finding benefits in audio denoising and inpainting tasks, albeit with higher computational cost than incurred by the standard approach.

Richard E. Turner | Maneesh Sahani | M. Sahani

[1] Douglas L. Jones,et al. A signal-dependent time-frequency representation: optimal kernel design , 1993, IEEE Trans. Signal Process..

[2] Richard E. Turner,et al. A Structured Model of Video Reproduces Primary Visual Cortical Organisation , 2009, PLoS Comput. Biol..

[3] Douglas L. Jones,et al. A high resolution data-adaptive time-frequency representation , 1990, IEEE Trans. Acoust. Speech Signal Process..

[4] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[5] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[6] M. Sahani,et al. Demodulation as Probabilistic Inference , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[8] Robert M. Gray,et al. Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.

[9] Kaare Brandt Petersen,et al. On the Slow Convergence of EM and VBEM in Low-Noise Linear Models , 2005, Neural Computation.

[10] Roland Badeau. Gaussian modeling of mixtures of non-stationary signals in the Time-Frequency domain (HR-NMF) , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[11] Carl E. Rasmussen,et al. A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[12] Antoine Liutkus,et al. Gaussian Processes for Underdetermined Source Separation , 2011, IEEE Transactions on Signal Processing.

[13] Norbert Wiener,et al. Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[14] S. Godsill,et al. Prior Structures for Time-Frequency Energy Distributions , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[15] Emmanuel Vincent,et al. Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16] Khaled H. Hamed,et al. Time-frequency analysis , 2003 .

[17] J. L. Flanagan,et al. Parametric coding of speech spectra , 1980 .

[18] Simon J. Godsill,et al. Bayesian Interpolation and Parameter Estimation in a Dynamic Sinusoidal Model , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20] Marvin H. J. Guber. Bayesian Spectrum Analysis and Parameter Estimation , 1988 .

[21] Douglas L. Jones,et al. An adaptive optimal-kernel time-frequency representation , 1995, IEEE Trans. Signal Process..

[22] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[23] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[24] Les E. Atlas,et al. Optimizing time-frequency kernels for classification , 2001, IEEE Trans. Signal Process..

[25] Christophe Andrieu,et al. Online Bayesian Inference in Some Time-Frequency Representations of Non-Stationary Processes , 2013, IEEE Transactions on Signal Processing.

[26] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[27] Brendan J. Frey,et al. Probabilistic Inference of Speech Signals from Phaseless Spectrograms , 2003, NIPS.

[28] Les E. Atlas,et al. Modulation decompositions for the interpolation of long gaps in acoustic signals , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29] Roland Badeau,et al. Variational Bayesian EM algorithm for modeling mixtures of non-stationary signals in the time-frequency domain (HR-NMF) , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30] Hirokazu Kameoka,et al. Complex NMF under spectrogram consistency constraints , 2009 .

[31] Malcolm Slaney,et al. Solving Demodulation as an Optimization Problem , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[32] Ali Taylan Cemgil,et al. Conjugate Gamma Markov Random Fields for Modelling Nonstationary Sources , 2007, ICA.

[33] Simon J. Godsill,et al. Probabilistic phase vocoder and its application to interpolation of missing values in audio signals , 2005, 2005 13th European Signal Processing Conference.

[34] A. W. M. van den Enden,et al. Discrete Time Signal Processing , 1989 .

[35] John R. Hershey,et al. Signal interaction and the devil function , 2010, INTERSPEECH.

[36] P. Smaragdis,et al. Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[37] Richard E. Turner,et al. Probabilistic Amplitude Demodulation , 2007, ICA.

[38] Emmanuel Vincent,et al. Fast bayesian nmf algorithms enforcing harmonicity and temporal continuity in polyphonic music transcription , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[39] Richard E. Turner,et al. Probabilistic amplitude and frequency demodulation , 2011, NIPS.

[40] Yu Huang,et al. Time-Frequency Representation Based on an Adaptive Short-Time Fourier Transform , 2010, IEEE Transactions on Signal Processing.

[41] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[42] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[43] Michael S. Lewicki,et al. A Hierarchical Bayesian Model for Learning Nonlinear Statistical Regularities in Nonstationary Natural Signals , 2005, Neural Computation.

[44] Richard E. Turner,et al. A Maximum-Likelihood Interpretation for Slow Feature Analysis , 2007, Neural Computation.

[45] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[46] Pascal Scalart,et al. Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[47] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[48] Leon Cohen,et al. Time Frequency Analysis: Theory and Applications , 1994 .

[49] Alan L. Yuille,et al. The g Factor: Relating Distributions on Features to Distributions on Images , 2001, NIPS.

[50] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .

[51] Stéphane Mallat,et al. Audio Denoising by Time-Frequency Block Thresholding , 2008, IEEE Transactions on Signal Processing.

[52] Jin Jiang,et al. Time-frequency feature representation using energy concentration: An overview of recent advances , 2009, Digit. Signal Process..

[53] Richard E. Turner. Statistical models for natural sounds , 2010 .

[54] Yuan Qi,et al. Bayesian spectrum estimation of unevenly sampled nonstationary data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.