Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs

Search engine is the popular term for an information retrieval (IR) system. Typically, search engine can be based on full-text indexing. Changing the presentation from the text data to multimedia data types make an information retrieval process more complex such as a retrieval of image or sounds in large databases. This paper introduces the use of language and text independent speech as input queries in a large sound database by using Speaker identification algorithm. The method consists of 2 main processing first steps, we separate vocal and non-vocal identification after that vocal be used to speaker identification for audio query by speaker voice. For the speaker identification and audio query by process, we estimate the similarity of the example signal and the samples in the queried database by calculating the Euclidian distance between the Mel frequency cepstral coefficients (MFCC) and Energy spectrum of acoustic features. The simulations show that the good performance with a sustainable computational cost and obtained the average accuracy rate more than 90%.

[1]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Shiuh-Pyng Shieh,et al.  ProbeBuilder: Uncovering Opaque Kernel Data Structures for Automatic Probe Construction , 2016, IEEE Transactions on Dependable and Secure Computing.

[3]  H. Hackbarth,et al.  Scaly artificial neural networks for speaker-independent recognition of isolated words , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[5]  D.S. Satish,et al.  Kernel based clustering and vector quantization for speech recognition , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[6]  N. S. Dey,et al.  Speech and Speaker Recognition System Using Artificial Neural Networks and Hidden Markov Model , 2012, 2012 International Conference on Communication Systems and Network Technologies.

[7]  Pierre Dumouchel,et al.  GPU implementation of an audio fingerprints similarity search algorithm , 2015, 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI).

[8]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[9]  R.W. Schafer,et al.  From frequency to quefrency: a history of the cepstrum , 2004, IEEE Signal Processing Magazine.

[10]  Fardad Farokhi,et al.  A Hybrid Reliable Algorithm for Speaker Recognition Based on Improved DTW and VQ by Genetic Algorithm in Noisy Environment , 2011, 2011 International Conference on Multimedia and Signal Processing.

[11]  Mohammad Mehdi Esnaashari,et al.  A review on web search engines' automatic evaluation methods and how to select the evaluation method , 2016, 2016 Second International Conference on Web Research (ICWR).

[12]  Alonso Inostrosa-Psijas,et al.  Simulating Search Engines , 2017, Computing in Science & Engineering.

[13]  Yan Zhang,et al.  Ensemble Learning and Optimizing KNN Method for Speaker Recognition , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[14]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[15]  Agus Buono,et al.  Modeling Text Independent Speaker Identification with Vector Quantization , 2017 .

[16]  DeLiang Wang,et al.  Robust Speaker Identification in Noisy and Reverberant Conditions , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Ezzedine Ben Braiek,et al.  On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification , 2016 .

[18]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[19]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[21]  T.F. Quatieri,et al.  The effects of telephone transmission degradations on speaker recognition performance , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[22]  John H. L. Hansen,et al.  Singing speaker clustering based on subspace learning in the GMM mean supervector space , 2013, Speech Commun..

[23]  D. Boughaci,et al.  Stochastic local search combined with LSB technique for image steganography , 2016, 2016 13th Learning and Technology Conference (L&T).

[24]  Abbes Amira,et al.  Speaker identification using multimodal neural networks and wavelet analysis , 2015, IET Biom..

[25]  Gokhan Ince,et al.  Commercial identification using audio fingerprinting , 2015, 2015 23nd Signal Processing and Communications Applications Conference (SIU).

[26]  Chi Zhang,et al.  Whisper-Island Detection Based on Unsupervised Segmentation With Entropy-Based Speech Feature Processing , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Gerhard Widmer,et al.  Robust Quad-Based Audio Fingerprinting , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Tanja Schultz,et al.  Far-Field Speaker Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Abrham Debasu Mengistu,et al.  Text independent Amharic Language Speaker Identification in Noisy Environments using speech processing Techniques , 2017 .

[30]  Benhard Sitohang,et al.  A New Strategy of Direct Access for Speaker Identification System Based on Classification , 2015 .

[31]  Xuelong Li,et al.  Relevance Preserving Projection and Ranking for Web Image Search Reranking , 2015, IEEE Transactions on Image Processing.

[32]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[33]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[34]  John H. L. Hansen,et al.  Speaker Identification Within Whispered Speech Audio Streams , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  John H. L. Hansen,et al.  Robust unsupervised detection of human screams in noisy acoustic environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[37]  Smriti Srivastava,et al.  GFM-Based Methods for Speaker Identification , 2013, IEEE Transactions on Cybernetics.

[38]  Seungjae Lee,et al.  Audio fingerprinting based on normalized spectral subband moments , 2006, IEEE Signal Processing Letters.

[39]  A. Nejat İnce Digital speech processing : speech coding, synthesis, and recognition , 1992 .

[40]  Klaus Dietz,et al.  Standardized Assessment Of Reading Speed: The New International Reading Speed Texts IReST , 2012 .

[41]  Bhiksha Raj,et al.  Privacy-Preserving Speaker Verification and Identification Using Gaussian Mixture Models , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Xiang-Gen Xia,et al.  An Approach for Refocusing of Ground Moving Target Without Target Motion Parameter Estimation , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[43]  C. Shahnaz,et al.  A feature extraction scheme based on enhanced wavelet coefficients for Speech Emotion Recognition , 2014, 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS).

[44]  J. Oglesby,et al.  Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation , 1995 .

[45]  Georges Quénot,et al.  Unsupervised Speaker Identification in TV Broadcast Based on Written Names , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.