论文信息 - Content-based singer classification on compressed domain audio data

Content-based singer classification on compressed domain audio data

In this paper, we proposed a singer identification approach to automatically identify the singer of an unknown MP3 audio data. Differing from previous researches for singer identification in MP3 compressed domain, we use Mel-Frequency Cepstral Coefficients (MFCC) as the feature instead of MDCT (modified discrete cosine transform) coefficients. Although MFCC is often used in music classification and speaker recognition, it cannot be directly obtained from compressed music data such as MP3 format. We introduce a modified method for calculating MFCC vector in MP3 compressed domain. For describing the distribution of MFCC vector, the Gaussian mixture model (GMM) is applied. To find the nearest singer, we use maximum likelihood classification (MLC) to allot each input MFCC vector to its nearest group. The experimental result verifies the feasibility of the proposed approach.

Tsung-Han Tsai | Yu-Siang Huang | Pei-Yun Liu | De-Ming Chen

[1] Constantine Kotropoulos,et al. Music Genre Classification: A Multilinear Approach , 2008, ISMIR.

[2] Charles A. Bouman,et al. CLUSTER: An Unsupervised Algorithm for Modeling Gaussian Mixtures , 2014 .

[3] Anssi Klapuri,et al. Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods , 2007, ISMIR.

[4] Youngmoo E. Kim,et al. Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[5] Changsheng Xu,et al. Singer identification based on vocal and instrumental models , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6] Gonçalo Marques,et al. A Music Classification Method based on Timbral Features , 2009, ISMIR.

[7] L. Yaroslavsky,et al. On the relationship between MDCT, SDPT and DFT , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[8] T.V. Geetha,et al. Music Information Retrieval of Carnatic Songs Based on Carnatic Music Singer Identification , 2008, 2008 International Conference on Computer and Electrical Engineering.

[9] Chih-Chin Liu,et al. A singer identification technique for content-based classification of MP3 music objects , 2002, CIKM '02.

[10] Beth Logan,et al. Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[11] Wei Cai,et al. Automatic singer identification based on auditory features , 2011, 2011 Seventh International Conference on Natural Computation.

[12] David Pye,et al. Content-based methods for the management of digital music , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13] Kian-Lee Tan,et al. Towards efficient automated singer identification in large music databases , 2006, SIGIR.

[14] Jieping Xu,et al. Notice of RetractionMulti-modal music genre classification approach , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[15] Jaakko Astola,et al. The Mel-Frequency Cepstral Coefficients in the Context of Singer Identification , 2005, ISMIR.

[16] Pandurangappa C,et al. International Journal of Emerging Technology and Advanced Engineering , 2022 .

[17] Constantine Kotropoulos,et al. Music Genre Classification Using Locality Preserving Non-Negative Tensor Factorization and Sparse Representations , 2009, ISMIR.

[18] Kaare Brandt Petersen,et al. Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music , 2006, ISMIR.

[19] Ye Wang,et al. Automatic Detection Of Vocal Segments In Popular Songs , 2004, ISMIR.

[20] Yuhua Jiao,et al. MDCT-Based Perceptual Hashing for Compressed Audio Content Identification , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[21] Xu Xue-qiong. Research and realization of speech segmentation in MP3 compressed domain , 2009 .

[22] Wei-Ho Tsai,et al. Automatic Identification of Simultaneous Singers in Duet Recordings , 2008, ISMIR.

[23] Jakob Abeßer,et al. Genre Classification Using Bass-Related High-Level Features and Playing Styles , 2009, ISMIR.

[24] Saifur Rahman,et al. SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .

[25] Chih-Chin Liu,et al. Content-based retrieval of MP3 music objects , 2001, CIKM '01.

[26] Hemant A. Patil,et al. Combining Evidences from Mel Cepstral Features and Cepstral Mean Subtracted Features for Singer Identification , 2012, 2012 International Conference on Asian Language Processing.

[27] Andreas Rauber,et al. Evaluation of Feature Extractors and Psycho-Acoustic Transformations for Music Genre Classification , 2005, ISMIR.

[28] Haizhou Li,et al. Exploring Perceptual Based Timbre Feature for Singer Identification , 2007, CMMR.

[29] Tong Zhang,et al. Automatic singer identification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[30] Gonçalo Marques,et al. Automatic Music Genre Classification Using a Hierarchical Clustering and a Language Model Approach , 2009, 2009 First International Conference on Advances in Multimedia.

[31] Hsin-Min Wang,et al. Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[32] Wen-Nung Lie,et al. Content-based retrieval of MP3 songs based on query by singing , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33] Bingxi Wang,et al. Speaker clustering via novel pseudo-divergence of Gaussian mixture models , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.