NMF-based environmental sound source separation using time-variant gain features

Various environmental sounds exist around us in our daily life. Recently, environmental sound recognition has drawn great attention for understanding our environment. However, because environmental sounds derive from multiple sound sources, it is difficult to recognize them accurately. If we were able to separate sound sources before sound recognition as a pre-process, then recognition would be easier and more accurate. We assume that monaural microphones are widely installed in mobile devices used as recording devices. This paper therefore presents a proposal for monaural sound source separation of environmental sounds. Two-phase clustering using non-negative matrix factorization (NMF) is proposed to separate monaural sound sources. In this proposal, the time-variant gain feature is used as an attribute of an environmental sound for more efficient sound separation.

[1]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Bhiksha Raj,et al.  Recognizing speech from simultaneous speakers , 2005, INTERSPEECH.

[3]  Zita Vale,et al.  A study on dynamic state information (DSI) around users for safe urban life , 2012, Comput. Math. Appl..

[4]  R. M. Schafer,et al.  The tuning of the world , 1977 .

[5]  Björn Schuller,et al.  The Munich 2011 CHiME Challenge Contribution: NMF-BLSTM Speech Enhancement and Recognition for Reverberated Multisource Environments , 2011, Interspeech 2011.

[6]  Hiroshi G. Okuno,et al.  Design and Implementation of Robot Audition System 'HARK' — Open Source Software for Listening to Three Simultaneous Speakers , 2010, Adv. Robotics.

[7]  Mao Ye,et al.  Multistability of α-divergence based NMF algorithms , 2012, Comput. Math. Appl..

[8]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Masahiro Nakano,et al.  Monophonic Instrument Sound Segregation by Clustering NMF Components Based on Basis Similarity and Gain Disjointness , 2010, ISMIR.

[10]  David Wessel,et al.  Realtime Multiple-pitch and Multiple-instrument Recognition For Music Signals using Sparse Non-negative Constraints , 2007 .

[12]  Annamaria Mesaros,et al.  Sound Event Detection in Multisource Environments Using Source Separation , 2011 .

[13]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Dieter Schmalstieg,et al.  Location based Applications for Mobile Augmented Reality , 2003, AUIC.

[15]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Volker Gnann SOURCE-FILTER BASED CLUSTERING FOR MONAURAL BLIND SOURCE SEPARATION , 2009 .

[17]  Shlomo Dubnov Extracting Sound Objects by Independent Subspace Analysis , 2002 .

[18]  Yong Xiang,et al.  Maximum contrast analysis for nonnegative blind source separation , 2011, Comput. Math. Appl..

[19]  Tetsuya Ogata,et al.  Environmental Sound Recognition for Robot Audition Using Matching-Pursuit , 2011, IEA/AIE.

[20]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[21]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[22]  Hirokazu Kameoka,et al.  A Real-time Equalizer of Harmonic and Percussive Components in Music Signals , 2008, ISMIR.

[23]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[24]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[25]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.