Gender classification by processing emotional speech

Gender information improves emotion classification accura cy. Accordingly, gender classification for emotional speech is an interesting research t opic. A pool of 1418 features is created, 619 of which are tested for the first time in gender cl assification. This is the largest feature set to the best of the authors’ knowledge. The fe atures are related to statistics of pitch, formant, and energy contours, as well as autocorrelation, MPEG-7 descriptors, Fujisaki’s model parameters, jitter, and shimmer. A featur e selection algorithm derives 15 features, which are input to a nearest neighbor classifier, radial-basis function neural network, probabilistic neural network, support vector machines, discriminant analysis-based classifier, classification tree, self organizing map, and ne ural gas network. Two databases are employed: the Berlin database of Emotional Speech and the Danish Emotional Speech database. A perfect classification accuracy is obtained. A s ystematic comparative study of the performance gains among the classifiers and the variants of each particular classifier is undertaken. Furthermore, the databases are assessed concerning gender classification accuracy. Results advance the state-of-the-art.

[1]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[2]  Heiko Hoffmann,et al.  An extension of neural gas to local PCA , 2004, Neurocomputing.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Liming Chen,et al.  Gender identification using a general audio classifier , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[5]  Ke Wu,et al.  Automatic recognition of gender by voice , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Florian Metze,et al.  Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Changyin Sun,et al.  Gender Classification Based on Boosting Local Binary Pattern , 2006, ISNN.

[8]  Mostafa Rahimi Azghadi,et al.  Gender Classification Based on FeedForward Backpropagation Neural Network , 2007, AIAI.

[9]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[10]  Douglas D. O'Shaughnessy,et al.  Robust gender-dependent acoustic-phonetic modelling in continuous speech recognition based on a new automatic male/female classification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Sridha Sridharan,et al.  Automatic gender identification optimised for language independence , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[12]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[13]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[14]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[15]  Zhen-Yang Wu,et al.  Robust GMM Based Gender Classification using Pitch and RASTA-PLP Parameters of Speech , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[16]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[17]  Markus Iseli,et al.  The role of voice source measures on automatic gender classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Korin Richmond,et al.  Acoustic Features for Profiling Mobile Users of Conversational Interfaces , 2004, Mobile HCI.

[19]  Wee Ser,et al.  Probabilistic neural-network structure determination for pattern classification , 2000, IEEE Trans. Neural Networks Learn. Syst..

[20]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Constantine Kotropoulos,et al.  Large scale musical instrument identification , 2007 .

[23]  Constantine Kotropoulos,et al.  Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections , 2006, 2006 14th European Signal Processing Conference.

[24]  J. Friedman Regularized Discriminant Analysis , 1989 .

[25]  Nikos Fakotakis,et al.  Gender-dependent and speaker-dependent speech enhancement , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  D. Childers,et al.  Gender recognition from speech. Part I: Coarse analysis. , 1991, The Journal of the Acoustical Society of America.

[27]  Liming Chen,et al.  A general audio classifier based on human perception motivated model , 2007, Multimedia Tools and Applications.

[28]  Panagiotis Zervas,et al.  Employing Fujisaki's Intonation Model Parameters for Emotion Recognition , 2006, SETN.

[29]  Constantine Kotropoulos,et al.  Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition , 2008, Signal Process..

[30]  Kazuo Onoe,et al.  Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News , 2007, IEICE Trans. Inf. Syst..

[31]  James E. McLean,et al.  The Tukey Honestly Significant Difference Procedure and Its Control of the Type I Error-Rate. , 1998 .

[32]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[33]  Stefan Slomka,et al.  Gender Gates for Telephone-Based Automatic Speaker Recognition , 1997, Digit. Signal Process..

[34]  Hansjörg Mixdorff,et al.  A novel approach to the fully automatic extraction of Fujisaki model parameters , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[35]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[36]  Mohammed Yeasin,et al.  Support Vector Learning for Gender Classification Using Audio and Visual Cues , 2003, Int. J. Pattern Recognit. Artif. Intell..

[37]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[38]  M. Sondhi,et al.  New methods of pitch extraction , 1968 .

[39]  S. Narayanan,et al.  Unsupervised speaker indexing using generic models , 2005, IEEE Transactions on Speech and Audio Processing.

[40]  Mehryar Mohri,et al.  Voice signatures , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[41]  Yi-Ping Phoebe Chen,et al.  Acoustic Features Extraction for Emotion Recognition , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[42]  J. W. Fussell Automatic sex identification from short segments of speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[43]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[44]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[45]  Margaret M. Burnett,et al.  Gender: An Important Factor in End-User Programming Environments? , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.