Singer Gender Classification using Feature-based and Spectrograms with Deep Convolutional Neural Network

The task of music information retrieval (MIR) is gaining much importance since the digital cloud is growing sparklingly. An important attribute of MIR is the singer-id, which helps effectively during the recommendation process. It is highly difficult to identify a singer in the case of music as the number of signers available in the digital cloud is high. The process of identifying the gender of a singer may simplify the task of singer identification and also helps with the recommendation. Hence, an effort has been made to detect the gender information of a singer. Two different datasets have been considered. Of which, one is collected from Indian cine industries having 20 different singer details of four regional languages. The other dataset is standard Artist20. Various spectral, temporal, and pitch related features have been used to obtain better accuracy. The features considered for this task are Mel-frequency cepstral coefficients (MFCCs), pitch, velocity, and acceleration of MFCCs. The experimentation has been done on various combinations of the mentioned features with the support of artificial neural networks (ANNs) and random forest (RF). Further, the genetic algorithm-based feature selection (GAFS) has been used to select the suitable features out of the best combination obtained. Moreover, we have also utilized the recent popular convolutional neural networks (CNNs) with the support of spectrograms to obtain better accuracy over the traditional feature vector. Average accuracy of 91.70% is obtained for both the Indian and Western clips, which is an improved accuracy of 3% over hand engineering features.

[1]  Thar Baker,et al.  Analysis of Dimensionality Reduction Techniques on Big Data , 2020, IEEE Access.

[2]  Peter Knees,et al.  Investigating Web-Based Approaches to Revealing Prototypical Music Artists in Genre Taxonomies , 2007, 2006 1st International Conference on Digital Information Management.

[3]  Douglas D. O'Shaughnessy,et al.  Multitaper MFCC and PLP features for speaker verification using i-vectors , 2013, Speech Commun..

[4]  Akshay Deepak,et al.  Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition , 2019, Multimedia Tools and Applications.

[5]  Abeer Alwan,et al.  On using voice source measures in automatic gender classification of children's speech , 2010, INTERSPEECH.

[6]  Allan Ramsay,et al.  Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions , 2017, Inf. Process. Manag..

[7]  Sung Wook Baik,et al.  Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network , 2017, 2017 International Conference on Platform Technology and Service (PlatCon).

[8]  Shashidhar G. Koolagudi,et al.  Identification of Hindi Dialects and Emotions using Spectral and Prosodic features of Speech , 2013 .

[9]  Santosh Gaikwad,et al.  GENDER IDENTIFICATION USING SVM WITH COMBINATION OF MFCC , 2012 .

[10]  Fred Cummins,et al.  Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Shashidhar G. Koolagudi,et al.  Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS) , 2018, Expert Syst. Appl..

[12]  Shashidhar G. Koolagudi,et al.  Content-Based Music Information Retrieval (CB-MIR) and Its Applications toward the Music Industry , 2018, ACM Comput. Surv..

[13]  Mohammad Hossein Sedaaghi,et al.  A Comparative Study of Gender and Age Classification in Speech Signals , 2009 .

[14]  Shashidhar G. Koolagudi,et al.  Singer Identification from Smaller Snippets of Audio Clips Using Acoustic Features and DNNs , 2018, 2018 Eleventh International Conference on Contemporary Computing (IC3).

[15]  Fabien Ringeval,et al.  Bird sounds classification by large scale acoustic features and extreme learning machine , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[16]  Alicja Wieczorkowska,et al.  Music Information Retrieval , 2009, Encyclopedia of Data Warehousing and Mining.

[17]  M. Alsulaiman,et al.  Voice intensity based gender classification by using Simpson's rule with SVM , 2012, 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP).

[18]  Michael K. Buckland,et al.  Annual Review of Information Science and Technology , 2006, J. Documentation.

[19]  J. S. Mason,et al.  Velocity and acceleration features in speaker recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Juul Mulder,et al.  Moved by music: A typology of music listeners , 2011 .

[21]  Shashidhar G. Koolagudi,et al.  Objective Assessment of Pitch Accuracy in Equal-Tempered Vocal Music Using Signal Processing Approaches , 2020 .

[22]  Buket D. Barkana,et al.  Deep neural network framework and transformed MFCCs for speaker's age and gender classification , 2017, Knowl. Based Syst..

[23]  Masataka Goto,et al.  Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Shashidhar G. Koolagudi,et al.  Vocal and Non-vocal Segmentation based on the Analysis of Formant Structure , 2017, 2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR).

[25]  Björn W. Schuller,et al.  Automatic Assessment of Singer Traits in Popular Music: Gender, Age, Height and Race , 2011, ISMIR.

[26]  Muhammad Ghulam,et al.  Gender Classification with Voice Intensity , 2011, 2011 UKSim 5th European Symposium on Computer Modeling and Simulation.

[27]  Ingo R. Titze,et al.  Principles of voice production , 1994 .

[28]  Shashidhar G. Koolagudi,et al.  Classification of vocal and non-vocal regions from audio songs using spectral features and pitch variations , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).

[29]  Zhang Yi,et al.  Spectrogram based multi-task audio classification , 2017, Multimedia Tools and Applications.

[30]  Buket D. Barkana,et al.  A new pitch-range based feature set for a speaker’s age and gender classification , 2015 .

[31]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[32]  Zhen-Yang Wu,et al.  Robust GMM Based Gender Classification using Pitch and RASTA-PLP Parameters of Speech , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[33]  Rajesh Kaluri,et al.  An Enhanced Framework for Sign Gesture Recognition using Hidden Markov Model and Adaptive Histogram Technique , 2017 .

[34]  Patricia Jo Huss Vocal Pitch Range and Habitual Pitch Level: The Study of Normal College Age Speakers , 1983 .

[35]  Mark A Gregory,et al.  A novel approach for MFCC feature extraction , 2010, 2010 4th International Conference on Signal Processing and Communication Systems.

[36]  Antonio Nucci,et al.  Pitch-based gender identification with two-stage classification , 2012, Secur. Commun. Networks.

[37]  Shashidhar G. Koolagudi,et al.  Detection of largest possible repeated patterns in Indian audio songs using spectral features , 2016, 2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[38]  Rajesh Kaluri,et al.  Sign Gesture Recognition Using Modified Region Growing Algorithm and Adaptive Genetic Fuzzy Classifier , 2016 .

[39]  Daniel P. W. Ellis,et al.  Classifying Music Audio with Timbral and Chroma Features , 2007, ISMIR.

[40]  Parcor Coeff,et al.  Comparison of Speaker Recognition Methods Using Statistical Features and Dynamic Features , 1981 .

[41]  Rajesh Kaluri,et al.  Optimized Feature Extraction for Precise Sign Gesture Recognition Using Self-improved Genetic Algorithm , 2018 .

[42]  Mukkamala S N V Jitendra A Review: Music Feature Extraction from an Audio Signal , 2020 .

[43]  Shashidhar G. Koolagudi,et al.  Audio Songs Classification Based on Music Patterns , 2016 .

[44]  Luiz Eduardo Soares de Oliveira,et al.  An evaluation of Convolutional Neural Networks for music classification using spectrograms , 2017, Appl. Soft Comput..

[45]  Yiding Wang,et al.  Combining Spatial and Temporal Information for Gait Based Gender Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[46]  Mladen Russo,et al.  Cochleogram-based approach for detecting perceived emotions in music , 2020, Inf. Process. Manag..

[47]  Rami S. Alkhawaldeh,et al.  DGR: Gender Recognition of Human Speech Using One-Dimensional Conventional Neural Network , 2019, Sci. Program..

[48]  Wei Cai,et al.  Automatic singer identification based on auditory features , 2011, 2011 Seventh International Conference on Natural Computation.