Computational auditory induction by missing-data non-negative matrix factorization

The human auditory system has the ability, known as auditory induction, to estimate the missing parts of a continuous auditory stream briefly covered by noise and perceptually resynthesize them. Humans are thus able to simultaneously analyze an auditory scene and reconstruct the underlying signal. In this article, we formulate this ability as a non-negative matrix factorization (NMF) problem with unobserved data, and show how to solve it using an auxiliary function method. We explain how this method can also be generally related to the EM algorithm, enabling the use of prior distributions on the parameters. We show how sparseness is a key to global feature extraction, and that our method is ideally able to extract patterns which never occur completely. We finally illustrate on an example how our method is able to simultaneously analyze a scene and interpolate the gaps into it.

[1]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[2]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3]  Lucas C. Parra,et al.  Recovery of constituent spectra using non-negative matrix factorization , 2003, SPIE Optics + Photonics.

[4]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Daniel P. W. Ellis,et al.  Towards single-channel unsupervised source separation of speech mixtures: the layered harmonics/formants separation-tracking model , 2004, SAPA@INTERSPEECH.

[6]  Morten Mørup,et al.  Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation , 2006, ICA.

[7]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[8]  Luiz W. P. Biscainho,et al.  An efficient model-based multirate method for reconstruction of audio signals across long gaps , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[10]  Guillermo Sapiro,et al.  Image inpainting , 2000, SIGGRAPH.

[11]  Jonathan Le Roux,et al.  Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction , 2008, SAPA@INTERSPEECH.

[12]  Peter J. W. Rayner,et al.  Digital Audio Restoration: A Statistical Model Based Approach , 1998 .

[13]  Mikkel N. Schmidt,et al.  Sparse Non-negative Matrix Factor 2-D Deconvolution , 2006 .

[14]  Makio Kashino,et al.  Phonemic restoration : The brain creates missing speech sounds , 2006 .

[15]  Patrick Pérez,et al.  Region filling and object removal by exemplar-based image inpainting , 2004, IEEE Transactions on Image Processing.

[16]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[17]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[18]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .