Multichannel Audio Source Separation With Probabilistic Reverberation Priors

Incorporating prior knowledge about the sources and/or the mixture is a way to improve under-determined audio source separation performance. A great number of informed source separation techniques concentrate on taking priors on the sources into account, but fewer works have focused on constraining the mixing model. In this paper, we address the problem of underdetermined multichannel audio source separation in reverberant conditions. We target a semi-informed scenario where some room parameters are known. Two probabilistic priors on the frequency response of the mixing filters are proposed. Early reverberation is characterized by an autoregressive model while according to statistical room acoustics results, late reverberation is represented by an autoregressive moving average model. Both reverberation models are defined in the frequency domain. They aim to transcribe the temporal characteristics of the mixing filters into frequency-domain correlations. Our approach leads to a maximum a posteriori estimation of the mixing filters which is achieved thanks to the expectation-maximization algorithm. We experimentally show the superiority of this approach compared with a maximum likelihood estimation of the mixing filters.

[1]  Hiroshi Sawada,et al.  Multichannel Sound Source Dereverberation and Separation for Arbitrary Number of Sources Based on Bayesian Nonparametrics , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Rémi Gribonval,et al.  Spatial location priors for Gaussian model based reverberant audio source separation , 2013, EURASIP J. Adv. Signal Process..

[5]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Hiroshi Sawada,et al.  MAP-Based Underdetermined Blind Source Separation of Convolutive Mixtures by Hierarchical Clustering and -Norm Minimization , 2007, EURASIP J. Adv. Signal Process..

[7]  S. Weinzierl,et al.  Perceptual Evaluation of Physical Predictors of the Mixing Time in Binaural Room Impulse Responses , 2010 .

[8]  T. J. Schultz Diffusion in reverberation rooms , 1971 .

[9]  Roland Badeau,et al.  Autoregressive moving average modeling of late reverberation in the frequency domain , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[10]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[11]  Douglas L. Jones,et al.  Blind estimation of reverberation time. , 2003, The Journal of the Acoustical Society of America.

[12]  Rémi Gribonval,et al.  Non negative sparse representation for Wiener based source separation with a single sensor , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  J. Cardoso,et al.  Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[14]  Jean-Marc Jot,et al.  Analysis and synthesis of room reverberation based on a statistical time-frequency model , 1997 .

[15]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[16]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[17]  C. Swanson On spectral estimation , 1962 .

[18]  Mark D. Plumbley,et al.  Probabilistic Modeling Paradigms for Audio Source Separation , 2010 .

[19]  Rémi Gribonval,et al.  From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound , 2014, IEEE Signal Processing Magazine.

[20]  GannotSharon,et al.  A variational EM algorithm for the separation of time-varying convolutive audio mixtures , 2016 .

[21]  Roland Badeau,et al.  Multichannel audio source separation with probabilistic reverberation modeling , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[22]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Gene H. Golub,et al.  Matrix computations , 1983 .

[24]  Antoine Liutkus,et al.  Generalized Wiener filtering with fractional power spectrograms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Emmanuel Vincent,et al.  Multichannel Audio Source Separation With Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Mohan M. Trivedi,et al.  Source localization in reverberant environments: modeling and statistical analysis , 2003, IEEE Trans. Speech Audio Process..

[28]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[29]  Antoine Liutkus,et al.  Gaussian Processes for Underdetermined Source Separation , 2011, IEEE Transactions on Signal Processing.

[30]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[31]  Shigeki Sagayama,et al.  Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  M. Schroeder Frequency‐Correlation Functions of Frequency Responses in Rooms , 1962 .

[35]  Hiroshi Sawada,et al.  Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Xiang Gao,et al.  Parametric Modeling of Wideband and Ultrawideband Channels in Frequency Domain , 2007, IEEE Transactions on Vehicular Technology.

[37]  Patrick A. Naylor,et al.  Speech Dereverberation , 2010 .

[38]  Gautham J. Mysore,et al.  Evaluation of a Score-informed Source Separation System , 2010, ISMIR.

[39]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[40]  Antoine Liutkus,et al.  Alpha-Stable Matrix Factorization , 2015, IEEE Signal Processing Letters.

[41]  Antoine Liutkus,et al.  Cauchy nonnegative matrix factorization , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[42]  R. Kumaresan On the zeros of the linear prediction-error filter for deterministic signals , 1983 .

[43]  Roland Badeau,et al.  A priori probabiliste anéchoïque pour la séparation sous-déterminée de sources sonores en milieu réverbérant , 2015 .

[44]  Roland Badeau,et al.  Singing voice detection with deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).