GMM-based speaker gender and age classification after voice conversion

This paper describes an experiment using the Gaussian mixture models (GMM) for classification of the speaker gender/age and for evaluation of the achieved success in the voice conversion process. The main motivation of the work was to test whether this type of the classifier can be utilized as an alternative approach instead of the conventional listening test in the area of speech evaluation. The proposed two-level GMM classifier was first verified for detection of four age categories (child, young, adult, senior) as well as discrimination of gender for all but children's voices in Czech and Slovak languages. Then the classifier was applied for gender/age determination of the basic adult male/female original speech together with its conversion. The obtained resulting classification accuracy confirms usability of the proposed evaluation method and effectiveness of the performed voice conversions.

[1]  S. Linville,et al.  Source characteristics of aged voice assessed from long-term average spectra. , 2002, Journal of voice : official journal of the Voice Foundation.

[2]  Terrance M. Nearey,et al.  Perception of speaker age in children’s voices , 2013 .

[3]  Rudolph F. Verderber The Challenge of Effective Speaking , 1970 .

[4]  Isabel Trancoso,et al.  Age and gender classification using fusion of acoustic and prosodic features , 2010, INTERSPEECH.

[5]  Zdenek Hanzlícek Czech HMM-Based Speech Synthesis , 2010, TSD.

[6]  Jiri Pribil,et al.  Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech , 2013, EURASIP J. Audio Speech Music. Process..

[7]  R J Baken,et al.  The aged voice: a new hypothesis. , 2005, Journal of voice : official journal of the Voice Foundation.

[8]  Jiri Pribil,et al.  Internet application for collective realization of speech evaluation by listening tests , 2013, 2013 International Conference on Applied Electronics.

[9]  Jeremy J. Donai,et al.  Gender identification from high-pass filtered vowel segments: The use of high-frequency energy , 2015, Attention, Perception, & Psychophysics.

[10]  Mireille Avigal,et al.  Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  J. Goodier The Concise Encyclopedia of Statistics , 2009 .

[12]  Martin Gruber,et al.  Listening-Test-Based Annotation of Communicative Functions for Expressive Speech Synthesis , 2010, TSD.

[13]  Daniel Tihelka,et al.  Current State of Czech Text-to-Speech System ARTIC , 2006, TSD.

[14]  Meryem Erbilek,et al.  Selective Review and Analysis of Aging Effects in Biometric System Implementation , 2015, IEEE Transactions on Human-Machine Systems.

[15]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[16]  Hugo Van hamme,et al.  Speaker age estimation using i-vectors , 2014, Eng. Appl. Artif. Intell..

[17]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[18]  R. Vich,et al.  New cepstral zero-pole vocal tract models for TTS synthesis , 2001, EUROCON'2001. International Conference on Trends in Communications. Technical Program, Proceedings (Cat. No.01EX439).

[19]  Jiri Pribil,et al.  Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description , 2006, Speech Commun..

[20]  Elmar Nöth,et al.  Age and gender recognition for telephone applications based on GMM supervectors and support vector machines , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Martin Gruber,et al.  Robust Methodology for TTS Enhancement Evaluation , 2013, TSD.

[22]  Christian A. Müller,et al.  Combining regression and classification methods for improving automatic speaker age recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Keikichi Hirose,et al.  Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Jindrich Matousek,et al.  Experiment with GMM-Based Artefact Localization in Czech Synthetic Speech , 2015, TSD.

[25]  Buket D. Barkana,et al.  A new pitch-range based feature set for a speaker’s age and gender classification , 2015 .