Separation of Singing Voice Using Nonnegative Matrix Partial Co-Factorization for Singer Identification

In order to improve the performance of singer identification, we propose a system to separate singing voice from music accompaniment for monaural recordings. Our system consists of two key stages. The first stage exploits the nonnegative matrix partial co-factorization (NMPCF), which is a joint matrix decomposition integrating prior knowledge of singing voice and pure accompaniment to separate the mixture signal into singing voice portion and accompaniment portion. In the second stage, based on the separated singing voice obtained by the first stage, the pitches of singing voice are first estimated and then the harmonic components of singing voice can be distinguished. For a frame, the distinguished harmonic components are regarded as reliable while other frequency components unreliable, thus the spectrum is incomplete. With those harmonic components, the complete spectrums of singing voice can be reconstructed by a missing feature method, spectrum reconstruction, obtaining a refined signal with more clean singing voice. Experimental results demonstrate that, from the point view of source separation, the singing voice refinement can further improve ΔSNR in contrast with the singing voice separation using NMPCF, while for the point view of singer identification, the singing voice separated by NMPCF is more appropriate than the refined singing voice.

[1]  Ching-Wei Chen,et al.  Improving melody extraction using Probabilistic Latent Component Analysis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Richard Polfreman,et al.  H-Semantics: A Hybrid Approach to Singing Voice Separation , 2012 .

[3]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4]  Preeti Rao,et al.  Singing voice detection in polyphonic music using predominant pitch , 2009, INTERSPEECH.

[5]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[6]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[8]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[9]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[10]  DeLiang Wang,et al.  Separation of Singing Voice From Music Accompaniment for Monaural Recordings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Wei-Ho Tsai,et al.  Background Music Removal Based on Cepstrum Transformation for Popular Singer Identification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  A.P. Klapuri,et al.  A perceptually motivated multiple-F0 estimation method , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[13]  Andrzej Cichocki,et al.  New Algorithms for Non-Negative Matrix Factorization in Applications to Blind Source Separation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Anssi Klapuri,et al.  Accompaniment separation and karaoke application based on automatic melody transcription , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[16]  Ali Taylan Cemgil,et al.  Score guided musical source separation using Generalized Coupled Tensor Factorization , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[17]  DeLiang Wang,et al.  Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  DeLiang Wang,et al.  A Supervised Learning Approach to Monaural Segregation of Reverberant Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Haizhou Li,et al.  Exploring Perceptual Based Timbre Feature for Singer Identification , 2007, CMMR.

[20]  Tuomas Virtanen,et al.  Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music , 2008, SAPA@INTERSPEECH.

[21]  Minje Kim,et al.  Nonnegative matrix partial co-factorization for drum source separation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Changshui Zhang,et al.  Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Minje Kim,et al.  Blind rhythmic source separation: Nonnegativity and repeatability , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[26]  DeLiang Wang,et al.  CASA-Based Robust Speaker Identification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Francis R. Bach,et al.  Semi-supervised NMF with Time-frequency Annotations for Single-channel Source Separation , 2012, ISMIR.

[28]  Bhiksha Raj SEPARATING A FOREGROUND SINGER FROM , 2006 .

[29]  Guizhong Liu,et al.  Singer identification based on computational auditory scene analysis and missing feature methods , 2013, Journal of Intelligent Information Systems.

[30]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Anssi Klapuri,et al.  Sound source separation in monaural music signals using excitation-filter model and em algorithm , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Minje Kim,et al.  Nonnegative Matrix Partial Co-Factorization for Spectral and Temporal Drum Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[33]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[34]  Kian-Lee Tan,et al.  A novel framework for efficient automated singer identification in large music databases , 2009, TOIS.

[35]  Preeti Rao,et al.  Vocal Melody Extraction in the Presence of Pitched Accompaniment in Polyphonic Music , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[37]  Hiromasa Fujihara,et al.  A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval , 2010, IEEE Transactions on Audio, Speech, and Language Processing.