Automatic singer identification based on auditory features

The paper describes a method of identifying singers' voice from the monophonic music including sounds of various musical instruments based on auditory features. In this system, there are four problems to solve, vocal segment detection, feature extraction, modeling of the singing voice and identification. For a song to be identified, the vocal/nonvocal segment is detected via a new classifier—Sparse Representation-based Classification (SRC). The feature extraction is of the most importance. Human ear can distinguish among different types of sounds, so auditory features to describe the singer's voice are important. To describe the auditory features, we calculate features of each frame including Mel-frequency Cepstral Coefficient (MFCC), Liner Prediction Mel-frequency Cepstral Coefficient (LPMCC) and Gammatone Cepstral Coefficient (GTCC). Finally, we introduce the Gaussian Mixture Model (GMM) to model the singers' voice. This system is demonstrated to improve the performance of an automatic singer identification system in Music Information Retrieval (MIR).

[1]  Ye Wang,et al.  Singing voice detection for karaoke application , 2005, Visual Communications and Image Processing.

[2]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[3]  Haizhou Li,et al.  Singing voice detection in pop songs using co-training algorithm , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Mohammed Bahoura,et al.  Voice singer detection in polyphonic music , 2009, 2009 16th IEEE International Conference on Electronics, Circuits and Systems - (ICECS 2009).

[5]  Changsheng Xu,et al.  Singer identification based on vocal and instrumental models , 2004, ICPR 2004.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[8]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[9]  DeLiang Wang,et al.  Separation of singing voice from music accompaniment for monaural recordings , 2007 .

[10]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Anssi Klapuri,et al.  Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods , 2007, ISMIR.

[12]  Hiromasa Fujihara,et al.  A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Haizhou Li,et al.  Exploring Perceptual Based Timbre Feature for Singer Identification , 2007, CMMR.

[14]  A. Chanrungutai,et al.  Singing Voice Separation in Mono-Channel Music , 2008, 2008 International Symposium on Communications and Information Technologies.

[15]  Haizhou Li,et al.  Exploring Vibrato-Motivated Acoustic Features for Singer Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Jin Huijun Gammatone filter bank to simulate the characteristics of the human basilar membrane , 2008 .

[17]  Gregory H. Wakefield,et al.  Singing voice identification using spectral envelope estimation , 2004, IEEE Transactions on Speech and Audio Processing.

[18]  Hsin-Min Wang,et al.  On the extraction of vocal-related information to facilitate the management of popular music collections , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[19]  Ye Wang,et al.  Automatic Detection Of Vocal Segments In Popular Songs , 2004, ISMIR.

[20]  Geoffroy Peeters,et al.  Singing voice detection in music tracks using direct voice vibrato detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Tong Zhang,et al.  Automatic singer identification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[22]  Waleed H. Abdulla,et al.  Auditory Based Feature Vectors for Speech Recognition Systems , 2002 .

[23]  E. Prame Measurements of the vibrato rate of ten singers , 1994 .

[24]  Preeti Rao,et al.  Singing voice detection in polyphonic music using predominant pitch , 2009, INTERSPEECH.

[25]  Peichen Chang Pitch Oriented Automatic Singer Identification in Pop Music , 2009, 2009 IEEE International Conference on Semantic Computing.