论文信息 - Automatic male-female voice discrimination

Automatic male-female voice discrimination

In this work, we have presented a novel simple scheme for classifying audio speech signals into male speech and female speech. In the context of content-based multimedia indexing gender identification based on speech signal is an important task. Some popular salient low level time-domain acoustic features which are very closely related to the physical properties of source audio signal like zero crossing rate (ZCR), short time energy (STE) along with spectral flux, a low level frequency domain feature, are used for this discrimination. RANSAC and Neural-Net has been used as classifier. The experimental result exhibits the efficiency of the proposed scheme.

Arijit Ghosal | Suchibrota Dutta | Suchibrota Dutta | Arijit Ghosal

[1] Robert P. W. Duin,et al. Support vector domain description , 1999, Pattern Recognit. Lett..

[2] J. Nouza,et al. Speech, Speaker and Speaker\'s Gender Identification in Automatically Processed Broadcast Stream , 2006 .

[3] Miriam Furst,et al. Classification of music type by a multilayer neural network , 1994 .

[4] Liming Chen,et al. Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[5] Hava T. Siegelmann,et al. Support Vector Clustering , 2002, J. Mach. Learn. Res..

[6] Akram M. Othman,et al. Speech Recognition Using Scaly Neural Networks , 2008 .

[7] Stéphane H. Maes,et al. A hierarchical approach to large-scale speaker recognition , 1999, EUROSPEECH.

[8] John Saunders,et al. Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9] Anssi Klapuri,et al. Musical instrument recognition using cepstral coefficients and temporal features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10] J. Stephen Downie,et al. The Scientific Evaluation of Music Information Retrieval Systems: Foundations and Future , 2004, Computer Music Journal.

[11] Don Kimber,et al. Acoustic Segmentation for Audio Browsers , 1997 .

[12] S.M. Ahadi,et al. Unsupervised speech/music classification using one-class support vector machines , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[13] Stephen Cox,et al. Features and classifiers for the automatic classification of musical audio signals , 2004, ISMIR.

[14] Zhu Liu,et al. Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[15] Markus Iseli,et al. The role of voice source measures on automatic gender classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16] William M. Hartmann,et al. Psychoacoustics: Facts and Models , 2001 .

[17] Adrian P. Simpson,et al. Phonetic differences between male and female speech , 2009, Lang. Linguistics Compass.

[18] Mohammad Hossein Sedaaghi,et al. A Comparative Study of Gender and Age Classification in Speech Signals , 2009 .

[19] S Grossberg,et al. A spectral network model of pitch perception. , 1995, The Journal of the Acoustical Society of America.

[20] Gurpreet Singh,et al. Multi Utility E-Controlled cum Voice Operated Farm Vehicle , 2010 .

[21] Jonathan Foote,et al. Content-based retrieval of music and audio , 1997, Other Conferences.

[22] Emanuele Trucco,et al. Computer and Robot Vision , 1995 .

[23] R. Rajeshwara Rao,et al. Glottal Excitation Feature based Gender Identification System using Ergodic HMM , 2011 .

[24] Peter Kabal,et al. Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[25] Scott E. Umbaugh,et al. Computer Imaging: Digital Image Analysis and Processing , 2005 .

[26] Stephen Cox,et al. Finding An Optimal Segmentation for Audio Genre Classification , 2005, ISMIR.

[27] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[28] C.-C. Jay Kuo,et al. Content-based classification and retrieval of audio , 1998, Optics & Photonics.

[29] Milan Sigmund,et al. Gender Distinction Using Short Segments of Speech Signal , 2008 .

[30] Douglas Keislar,et al. Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[31] B. S. Manjunath,et al. The multiRANSAC algorithm and its application to detect planar homographies , 2005, IEEE International Conference on Image Processing 2005.

[32] Tsuhan Chen,et al. Audio feature extraction and analysis for scene classification , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[33] Bibhas Chandra Dhara,et al. Speech/Music Classification Using Occurrence Pattern of ZCR and STE , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[34] Guodong Guo,et al. Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[35] Shih-Fu Chang,et al. Survey on Compressed-Domain Features used in Video / Audio Indexing and Analysis , 2001 .

[36] Ichiro Fujinaga,et al. Automatic Genre Classification Using Large High-Level Musical Feature Sets , 2004, ISMIR.

[37] Zhen-Yang Wu,et al. Robust GMM Based Gender Classification using Pitch and RASTA-PLP Parameters of Speech , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[38] Parag C. Pendharkar,et al. A field study of the impact of gender and user's technical experience on the performance of voice-activated medical tracking application , 2004, Int. J. Hum. Comput. Stud..

[39] Francesco Camastra,et al. A novel kernel method for clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Malcolm Slaney,et al. Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .