A new direct access framework for speaker identification system

We present in this paper a new Direct Access Framework (DAF) for speaker identification system, to identify a speaker based on original characteristics of the human voice. Direct access method is a process to identify an object based on parts of the object itself, the parts called original characteristics. The proposed framework consists of two parts, the enrolment process and the identification process. Phases are as the following: speech preprocessing, speaker feature extraction, feature normalization, feature selection, speaker modeling, direct access method and speaker matching. In this paper, we used Indonesian speaker dataset containing 2,140 speech files, 142 speakers, 97 male and 45 female. The identification accuracy level based on MFCC features is 94.38% and the accuracy of speaker gender-based classification up to 100% based on pitch, flatness, brightness, and roll off features. The proposed framework helped the researcher in speaker identification system domain for implementing their proposed algorithms or model to obtain the best speaker identification system for various dataset. DAF is also could be used as a basic framework for the other multimedia data as well as image or video.

[1]  Petri Toiviainen,et al.  A Matlab Toolbox for Music Information Retrieval , 2007, GfKl.

[2]  Feng Huang,et al.  Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Anil K. Jain,et al.  Handbook of Fingerprint Recognition , 2005, Springer Professional Computing.

[4]  Bipin C. Desai,et al.  A Framework for Medical Image Retrieval Using Machine Learning and Statistical Similarity Matching Techniques With Relevance Feedback , 2007, IEEE Transactions on Information Technology in Biomedicine.

[5]  Zhenghui Xie,et al.  A Framework of CBIR System Based on Relevance Feedback , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[6]  Ning Wang,et al.  Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Brian Kingsbury,et al.  Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Marie-Noëlle Terrasse,et al.  A CBIR-framework: using both syntactical and semantical information for image description , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[9]  Wei Liu,et al.  A CBIR framework: Dimension reduction by radial basis function , 2012, Proceedings of 2012 2nd International Conference on Computer Science and Network Technology.

[10]  Elif Derya Übeyli,et al.  Multiclass Support Vector Machines for EEG-Signals Classification , 2007, IEEE Transactions on Information Technology in Biomedicine.

[11]  Benhard Sitohang,et al.  Direct access in content-based Audio Information Retrieval: A state of the art and challenges , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[12]  Azarias Reda,et al.  Hyke: a low-cost remote attendance tracking system for developing regions , 2011, NSDR '11.

[13]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.