Novel model for pitch estimation using hybrid DWT-DCT HPS

Pitch is an important feature of speech. Therefore, extraction of pitch becomes a vital task for processes like speaker coding, speaker recognition, speech synthesis, speech recognition and many such applications. The few available algorithms such as Discrete Cosine Transform (DCT) based pitch extraction, harmonic product spectrum (HPS) which is obtained from DCT are useful for extraction of pitch. In this paper, we propose a hybrid Discrete Wavelet Transform-Discrete Cosine Transform (DWT- DCT HPS) based pitch extraction. A voice sample is taken and de-segmented into 36 bands in the frequency domain. Then on those bands spatial domain transformation is performed to get the most prominent features. The Gross pitch error (GPE) and Fine pitch error (FPE) criteria is used as a measure to find the accuracy of the novel method. The result depicts that the novel proposed Hybrid method is better as compared to DCT-HPS in terms of Pitch error.

[1]  Mohamed S. Kamel,et al.  Multibiometric system using fuzzy level set, and genetic and evolutionary feature extraction , 2015, IET Biom..

[2]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[3]  Rulph Chassaing DSP Applications Using C and the TMS320C6x DSK , 2002 .

[4]  Vidosav Stojanovic John G. Proakis and Dimitris G. Manolakis Digital signal processing: Principles, Algorithms, and Applications, 4/e Hardcover, Pearson Prentice Hall, Pearson Education, Inc. Upper Saddle River, NJ , 2006 .

[5]  Anil K. Jain,et al.  Multibiometric Cryptosystems Based on Feature-Level Fusion , 2012, IEEE Transactions on Information Forensics and Security.

[6]  Wayne H. Ward,et al.  Speech recognition , 1997 .

[7]  Haibo He,et al.  Learning Race from Face: A Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[9]  Rashmi Gupta,et al.  A Keyword-Driven Tool for Testing Web Applications (KeyDriver) , 2014, IEEE Potentials.

[10]  Rajiv Kapoor,et al.  Statistically matched wavelet-based method for detection of power quality events , 2011 .

[11]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[12]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[13]  Guodong Guo,et al.  A framework for joint estimation of age, gender and ethnicity on a large database , 2014, Image Vis. Comput..

[14]  Craig Stuart Sapp,et al.  Efficient Pitch Detection Techniques for Interactive Music , 2001, ICMC.

[15]  Alvin F. Martin,et al.  The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..

[16]  David A. Krubsack,et al.  An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech , 1991, IEEE Trans. Signal Process..

[17]  Rajiv Kapoor,et al.  Fuzzy lattice based technique for classification of power quality disturbances , 2012 .

[18]  Rajiv Kapoor,et al.  Non-linear dimensionality reduction using fuzzy lattices , 2013, IET Comput. Vis..

[19]  Rolf Ingold,et al.  Combined Handwriting and Speech Modalities for User Authentication , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[20]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[21]  DeLiang Wang,et al.  A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Ioan Marius Bilasco,et al.  Boosting gender recognition performance with a fuzzy inference system , 2015, Expert Syst. Appl..

[23]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[24]  Arun Ross,et al.  Periocular Biometrics in the Visible Spectrum , 2011, IEEE Transactions on Information Forensics and Security.

[25]  Lars Kai Hansen,et al.  Pitch Based Sound Classification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[26]  Christoph Busch,et al.  Complex common vector for multimodal biometric recognition , 2009 .

[27]  Rajiv Kapoor,et al.  Classification of power quality disturbances using non-linear dimension reduction , 2013 .

[28]  Rajiv Kapoor,et al.  Comparison of graph-based methods for non-linear dimensionality reduction , 2012 .

[29]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[30]  John G. Proakis,et al.  Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .

[31]  Duan-Yu Chen,et al.  Robust gender recognition for uncontrolled environment of real-life images , 2010, IEEE Transactions on Consumer Electronics.

[32]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .