Time-Frequency Analysis as Probabilistic Inference

This paper proposes a new view of time-frequency analysis framed in terms of probabilistic inference. Natural signals are assumed to be formed by the superposition of distinct time-frequency components, with the analytic goal being to infer these components by application of Bayes' rule. The framework serves to unify various existing models for natural time-series; it relates to both the Wiener and Kalman filters, and with suitable assumptions yields inferential interpretations of the short-time Fourier transform, spectrogram, filter bank, and wavelet representations. Value is gained by placing time-frequency analysis on the same probabilistic basis as is often employed in applications such as denoising, source separation, or recognition. Uncertainty in the time-frequency representation can be propagated correctly to application-specific stages, improving the handing of noise and missing data. Probabilistic learning allows modules to be co-adapted; thus, the time-frequency representation can be adapted to both the demands of the application and the time-varying statistics of the signal at hand. Similarly, the application module can be adapted to fine properties of the signal propagated by the initial time-frequency processing. We demonstrate these benefits by combining probabilistic time-frequency representations with non-negative matrix factorization, finding benefits in audio denoising and inpainting tasks, albeit with higher computational cost than incurred by the standard approach.

[1]  Douglas L. Jones,et al.  A signal-dependent time-frequency representation: optimal kernel design , 1993, IEEE Trans. Signal Process..

[2]  Richard E. Turner,et al.  A Structured Model of Video Reproduces Primary Visual Cortical Organisation , 2009, PLoS Comput. Biol..

[3]  Douglas L. Jones,et al.  A high resolution data-adaptive time-frequency representation , 1990, IEEE Trans. Acoust. Speech Signal Process..

[4]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[5]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[6]  M. Sahani,et al.  Demodulation as Probabilistic Inference , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[8]  Robert M. Gray,et al.  Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.

[9]  Kaare Brandt Petersen,et al.  On the Slow Convergence of EM and VBEM in Low-Noise Linear Models , 2005, Neural Computation.

[10]  Roland Badeau Gaussian modeling of mixtures of non-stationary signals in the Time-Frequency domain (HR-NMF) , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[11]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[12]  Antoine Liutkus,et al.  Gaussian Processes for Underdetermined Source Separation , 2011, IEEE Transactions on Signal Processing.

[13]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[14]  S. Godsill,et al.  Prior Structures for Time-Frequency Energy Distributions , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[15]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Khaled H. Hamed,et al.  Time-frequency analysis , 2003 .

[17]  J. L. Flanagan,et al.  Parametric coding of speech spectra , 1980 .

[18]  Simon J. Godsill,et al.  Bayesian Interpolation and Parameter Estimation in a Dynamic Sinusoidal Model , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  Marvin H. J. Guber Bayesian Spectrum Analysis and Parameter Estimation , 1988 .

[21]  Douglas L. Jones,et al.  An adaptive optimal-kernel time-frequency representation , 1995, IEEE Trans. Signal Process..

[22]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[23]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[24]  Les E. Atlas,et al.  Optimizing time-frequency kernels for classification , 2001, IEEE Trans. Signal Process..

[25]  Christophe Andrieu,et al.  Online Bayesian Inference in Some Time-Frequency Representations of Non-Stationary Processes , 2013, IEEE Transactions on Signal Processing.

[26]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[27]  Brendan J. Frey,et al.  Probabilistic Inference of Speech Signals from Phaseless Spectrograms , 2003, NIPS.

[28]  Les E. Atlas,et al.  Modulation decompositions for the interpolation of long gaps in acoustic signals , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Roland Badeau,et al.  Variational Bayesian EM algorithm for modeling mixtures of non-stationary signals in the time-frequency domain (HR-NMF) , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Hirokazu Kameoka,et al.  Complex NMF under spectrogram consistency constraints , 2009 .

[31]  Malcolm Slaney,et al.  Solving Demodulation as an Optimization Problem , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Ali Taylan Cemgil,et al.  Conjugate Gamma Markov Random Fields for Modelling Nonstationary Sources , 2007, ICA.

[33]  Simon J. Godsill,et al.  Probabilistic phase vocoder and its application to interpolation of missing values in audio signals , 2005, 2005 13th European Signal Processing Conference.

[34]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[35]  John R. Hershey,et al.  Signal interaction and the devil function , 2010, INTERSPEECH.

[36]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[37]  Richard E. Turner,et al.  Probabilistic Amplitude Demodulation , 2007, ICA.

[38]  Emmanuel Vincent,et al.  Fast bayesian nmf algorithms enforcing harmonicity and temporal continuity in polyphonic music transcription , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[39]  Richard E. Turner,et al.  Probabilistic amplitude and frequency demodulation , 2011, NIPS.

[40]  Yu Huang,et al.  Time-Frequency Representation Based on an Adaptive Short-Time Fourier Transform , 2010, IEEE Transactions on Signal Processing.

[41]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[42]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  Michael S. Lewicki,et al.  A Hierarchical Bayesian Model for Learning Nonlinear Statistical Regularities in Nonstationary Natural Signals , 2005, Neural Computation.

[44]  Richard E. Turner,et al.  A Maximum-Likelihood Interpretation for Slow Feature Analysis , 2007, Neural Computation.

[45]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[46]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[47]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[48]  Leon Cohen,et al.  Time Frequency Analysis: Theory and Applications , 1994 .

[49]  Alan L. Yuille,et al.  The g Factor: Relating Distributions on Features to Distributions on Images , 2001, NIPS.

[50]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[51]  Stéphane Mallat,et al.  Audio Denoising by Time-Frequency Block Thresholding , 2008, IEEE Transactions on Signal Processing.

[52]  Jin Jiang,et al.  Time-frequency feature representation using energy concentration: An overview of recent advances , 2009, Digit. Signal Process..

[53]  Richard E. Turner Statistical models for natural sounds , 2010 .

[54]  Yuan Qi,et al.  Bayesian spectrum estimation of unevenly sampled nonstationary data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.