Lp-norm non-negative matrix factorization and its application to singing voice enhancement

Measures of sparsity are useful in many aspects of audio signal processing including speech enhancement, audio coding and singing voice enhancement, and the well-known method for these applications is non-negative matrix factorization (NMF), which decomposes a non-negative data matrix into two non-negative matrices. Although previous studies on NMF have focused on the sparsity of the two matrices, the sparsity of reconstruction errors between a data matrix and the two matrices is also important, since designing the sparsity is equivalent to assuming the nature of the errors. We propose a new NMF technique, which we called Lp-norm NMF, that minimizes the Lp norm of the reconstruction errors, and derive a computationally efficient algorithm for Lp-norm NMF according to an auxiliary function principle. This algorithm can be generalized for the factorization of a real-valued matrix into the product of two real-valued matrices. We apply the algorithm to singing voice enhancement and show that adequately selecting p improves the enhancement.

[1]  Simon J. Godsill,et al.  Bayesian extensions to non-negative matrix factorisation for audio signal modelling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[4]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[5]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[7]  Yu-Jin Zhang,et al.  Nonnegative Matrix Factorization: A Comprehensive Review , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Anssi Klapuri,et al.  Accompaniment separation and karaoke application based on automatic melody transcription , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[10]  Hiromasa Fujihara,et al.  Singer Identification Based on Accompaniment Sound Reduction and Reliable Frame Selection , 2005, ISMIR.

[11]  Shigeki Sagayama,et al.  Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Bryan Pardo,et al.  Combining Rhythm-Based and Pitch-Based Methods for Background and Melody Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  P. Philippe,et al.  One microphone singing voice separation using source-adapted models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[15]  Akinori Ito,et al.  Music Information Retrieval from a Singing Voice Using Lyrics and Melody Information , 2007, EURASIP J. Adv. Signal Process..

[16]  Hirokazu Kameoka,et al.  Selective Amplifier of Periodic and Non-periodic Components in Concurrent Audio Signals with Spectral Control Envelopes , 2006 .

[17]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[18]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Tuomas Virtanen,et al.  Automatic Recognition of Lyrics in Singing , 2010, EURASIP J. Audio Speech Music. Process..

[20]  Judith C. Brown,et al.  Non-Negative Matrix Factorization for Polyphonic Music Transcription Paris , 2003 .

[21]  Bryan Pardo,et al.  A simple music/voice separation method based on the extraction of the repeating musical structure , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).