A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval

This paper describes a method of modeling the characteristics of a singing voice from polyphonic musical audio signals including sounds of various musical instruments. Because singing voices play an important role in musical pieces with vocals, such representation is useful for music information retrieval systems. The main problem in modeling the characteristics of a singing voice is the negative influences caused by accompaniment sounds. To solve this problem, we developed two methods, accompaniment sound reduction and reliable frame selection . The former makes it possible to calculate feature vectors that represent a spectral envelope of a singing voice after reducing accompaniment sounds. It first extracts the harmonic components of the predominant melody from sound mixtures and then resynthesizes the melody by using a sinusoidal model driven by these components. The latter method then estimates the reliability of frame of the obtained melody (i.e., the influence of accompaniment sound) by using two Gaussian mixture models (GMMs) for vocal and nonvocal frames to select the reliable vocal portions of musical pieces. Finally, each song is represented by its GMM consisting of the reliable frames. This new representation of the singing voice is demonstrated to improve the performance of an automatic singer identification system and to achieve an MIR system based on vocal timbre similarity.

[1]  Wei-Ho Tsai,et al.  Automatic Identification of Simultaneous Singers in Duet Recordings , 2008, ISMIR.

[2]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[3]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[4]  Anssi Klapuri,et al.  Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods , 2007, ISMIR.

[5]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[6]  George Tzanetakis,et al.  A scalable peer-to-peer system for music content and information retrieval , 2003, ISMIR.

[7]  Daniel P. W. Ellis,et al.  Classifying Music Audio with Timbral and Chroma Features , 2007, ISMIR.

[8]  Tong Zhang,et al.  Automatic singer identification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[9]  Beth Logan,et al.  Content-Based Playlist Generation: Exploratory Experiments , 2002, ISMIR.

[10]  Oliver Hellmuth,et al.  A multiple feature model for musical similarity retrieval , 2003, ISMIR.

[11]  Hsin-Min Wang,et al.  Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Youngmoo E. Kim,et al.  Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[13]  Haruhiro Katayose,et al.  Using Bass-line Features for Content-Based MIR , 2008, ISMIR.

[14]  François Pachet,et al.  Music Similarity Measures: What's the use? , 2002, ISMIR.

[15]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[16]  T. Virtanen,et al.  Probabilistic Model Based Similarity Measures for Audio Query-by-Example , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[17]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[18]  Elias Pampalk,et al.  Computational Models of Music Similarity and their Application in Music Information Retrieval , 2006 .

[19]  Masataka Goto,et al.  Instrogram: Probabilistic Representation of Instrument Existence for Polyphonic Music , 2007 .

[20]  Gerhard Widmer,et al.  Probabilistic Combination of Features for Music Classification , 2006, ISMIR.

[21]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[22]  Tomohiro Nakatani,et al.  Harmonic sound stream segregation using localization and its application to speech stream segregation , 1999, Speech Commun..

[23]  Peter Knees,et al.  Independent Component Analysis for Music Similarity Computation , 2006, ISMIR.

[24]  Daniel P. W. Ellis,et al.  USING VOICE SEGMENTS TO IMPROVE ARTIST CLASSIFICATION OF MUSIC , 2002 .

[25]  Steve Lawrence,et al.  Artist detection in music with Minnowmatch , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[26]  Hsin-Min Wang,et al.  Automatic detection and tracking of target singer in multi-singer music recordings , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.