An efficient speech recognition system for arm‐disabled students based on isolated words

Over the previous decades, a need has emerged to empower human‐machine communication systems, which are essential to not only perform actions, but also obtain information especially in education applications. Moreover, any communication system has to introduce an efficient and easy way for interaction with a minimum possible error rate. The keyboard, mouse, trackball, touch‐screen, and joystick are all examples of tools which were built to provide mechanical human‐to‐machine interaction. However, a system with the ability to use oral speech, which is the natural form of communication between humans instead of mechanical communication systems, can be more practical for normal students and even a necessity for arm‐disabled students who cannot use their arms to handle traditional education tools like pens and notebooks. In this paper, we present a speech recognition system that allows arm‐disabled students to control computers by voice as a helping tool in the educational process. When a student speaks through a microphone, the speech is divided into isolated words which are compared with a predefined database of huge number of spoken words to find a match. After that, each recognized word is translated into its related tasks which will be performed by the computer like opening a teaching application or renaming a file. The speech recognition process discussed in this paper involves two separate approaches; the first approach is based on double thresholds voice activity detection and improved Mel‐frequency cepstral coefficients (MFCC), while the second approach is based on discrete wavelet transform along with modified MFCC algorithm. Utilizing the best values for all parameters in just mentioned techniques, our proposed system achieved a recognition rate of 98.7% using the first approach, and 98.86% using the second approach of which is better in ratio than the first one but slower in processing which is a critical point for a real time system. Both proposed approaches were compared with other relevant approaches and their recognition rates were noticeably higher.

[1]  Khalid A. Darabkh,et al.  Buffering study over intermediate hops including packet retransmission , 2010, 2010 International Conference on Multimedia Computing and Information Technology (MCIT).

[2]  Khalid A. Darabkh,et al.  A Rule-Based Fuzzy Inference System for Adaptive Image Contrast Enhancement , 2012, Comput. J..

[3]  Khalid A. Darabkh Queuing Analysis and Simulation of Wireless Access and End Point Systems using Fano Decoding , 2010, J. Commun..

[4]  M.M. Azmi,et al.  Syllable-based automatic arabic speech recognition in noisy enviroment , 2008, 2008 International Conference on Audio, Language and Image Processing.

[5]  Khalid A. Darabkh,et al.  New arriving process for convolutional codes with adaptive behavior , 2012, International Multi-Conference on Systems, Sygnals & Devices.

[6]  Khalid A. Darabkh,et al.  Performance Evaluation of Sequential Decoding System for UDP-Based Systems for Wireless Multimedia Networks , 2006, ICWN.

[7]  Khalid A. Darabkh,et al.  An efficient bit reversal permutation algorithm , 2013, 2013 International Conference on Robotics, Biomimetics, Intelligent Computational Systems.

[8]  Khalid A. Darabkh,et al.  Stationary queue-size distribution for variable complexity sequential decoders with large timeout , 2006, ACM-SE 44.

[9]  Khalid A. Darabkh,et al.  Block-Based Steganographic Algorithm Using Modulus Function and Pixel-Value Differencing , 2017 .

[10]  Khalid A. Darabkh,et al.  An improved reversible data hiding algorithm based on modification of prediction errors , 2014, Digital Image Processing.

[11]  Gheith A. Abandah,et al.  An improved queuing model for packet retransmission policy and variable latency decoders , 2012, IET Commun..

[12]  R. Venkatesha Prasad,et al.  Comparison of voice activity detection algorithms for VoIP , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[13]  Hanaa S. Ali,et al.  Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models , 2010, ArXiv.

[14]  Khalid A. Darabkh,et al.  A Yet Efficient Communication System with Hearing-Impaired People Based on Isolated Words of Arabic Language , 2013 .

[15]  Hassan Satori,et al.  Introduction to Arabic Speech Recognition Using CMUSphinx System , 2007, ArXiv.

[16]  Khalid A. Darabkh,et al.  Effect of Eyelid and Eyelash Occlusions on a Practical Iris Recognition System: Analysis and Solution , 2015, Int. J. Pattern Recognit. Artif. Intell..

[17]  Gyanendra K. Verma,et al.  Multi-algorithm Fusion for Speech Emotion Recognition , 2011, ACC.

[18]  Khalid A. Darabkh Imperceptible and Robust DWT-SVD-Based Digital Audio Watermarking Algorithm , 2014 .

[19]  Khalid A. Darabkh,et al.  Novel Protocols for Improving the Performance of ODMRP and EODMRP over Mobile Ad Hoc Networks , 2015, Int. J. Distributed Sens. Networks.

[20]  Alan F. Smeaton,et al.  Indexing of Fictional Video Content for Event Detection and Summarisation , 2007, EURASIP J. Image Video Process..

[21]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[22]  Vivian Raithel,et al.  An e-Learning Environment for Deaf Adults , 2004 .

[23]  J. El Abbadi,et al.  A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language , 2011 .

[24]  Kenichi Harada,et al.  Biliary Innate Immunity: Function and Modulation , 2010, Mediators of inflammation.

[25]  Guo Li,et al.  Improved Voice Activity Detection Based on Iterative Spectral Subtraction and Double Thresholds for CVR , 2008, 2008 Workshop on Power Electronics and Intelligent Transportation System.

[26]  Khalid A. Darabkh,et al.  A New Image Steganographic Approach for Secure Communication Based on LSB Replacement Method , 2015, Inf. Technol. Control..

[27]  A. H. Khalil,et al.  A FPGA-based HMM for a discrete Arabic speech recognition system , 2003, Proceedings of the 12th IEEE International Conference on Fuzzy Systems (Cat. No.03CH37442).

[28]  A. F. Khalifeh,et al.  Performance evaluation of Voice-Controlled Online Systems , 2012, International Multi-Conference on Systems, Sygnals & Devices.

[29]  Susumu Harada,et al.  Harnessing the capacity of the human voice for fluidly controlling computer interfaces , 2010 .

[30]  Khalid A. Darabkh,et al.  Improving UDP performance using intermediate QoD-aware hop system for wired/wireless multimedia communication systems , 2011, Int. J. Netw. Manag..

[31]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[32]  Tridibesh Dutta,et al.  Dynamic Time Warping Based Approach to Text-Dependent Speaker Identification Using Spectrograms , 2008, 2008 Congress on Image and Signal Processing.

[33]  Khalid A. Darabkh,et al.  Efficient PFD-Based Networking and Buffering Models for Improving Video Quality over Congested Links , 2014, Wirel. Pers. Commun..

[34]  Khalid A. Darabkh,et al.  SARDH: A novel sharpening-aware reversible data hiding algorithm , 2016, J. Vis. Commun. Image Represent..

[35]  Denis Beautemps,et al.  Automatic identification of vowels in the Cued Speech context , 2007 .

[36]  Mokhtar Sellami,et al.  Combination of vector quantization and hidden Markov models for Arabic speech recognition , 2001, Proceedings ACS/IEEE International Conference on Computer Systems and Applications.

[37]  Ahmed M. Alkababji,et al.  Best Wavelet Filter for a Wavelet Neural Fricatives Recognition System , 2011 .

[38]  A. F. Khalifeh,et al.  Mobile-free driving with Android phones: System design and performance evaluation , 2012, International Multi-Conference on Systems, Sygnals & Devices.

[39]  Shao-Wei Lu,et al.  EEG-based brain-computer interface for smart living environmental auto-adjustment , 2010 .

[40]  Khalid A. Darabkh,et al.  An efficient method for feature extraction of human iris patterns , 2014, 2014 IEEE 11th International Multi-Conference on Systems, Signals & Devices (SSD14).

[41]  Marwan Al-Zabibi An acoustic-phonetic approach in automatic arabic speech recognition , 1990 .

[42]  Khalid A. Darabkh,et al.  New recognition methods for human iris patterns , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[43]  James R. Mallory,et al.  Overcoming Communication Barriers: Communicating with Deaf People , 1992, Libr. Trends.

[44]  John K. Duffy Coping with Hearing Loss: A Guide for Adults and Their Families. , 1986 .

[45]  Khalid A. Darabkh,et al.  New video discarding policies for improving UDP performance over wired/wireless networks , 2015, Int. J. Netw. Manag..

[46]  Hrvoje Gebavi,et al.  and Communication Technology, Electronics and Microelectronics (MIPRO) , 2015 .

[47]  Lisa B. Elliot,et al.  Deaf and Hard-of-Hearing Students' Memory of Lectures with Speech-to-Text and Interpreting/Note Taking Services , 2009 .

[48]  Mervat Fashal,et al.  Syllable-based automatic Arabic speech recognition , 2008 .

[49]  Khalid A. Darabkh Fast and upper bounded Fano decoding algorithm: queuing analysis , 2017, Trans. Emerg. Telecommun. Technol..

[50]  孫 喜浩 A Study on Efficient Robust Speech Recognition with Stochastic Dynamic Time Warping , 2014 .

[51]  Khalid A. Darabkh,et al.  Efficient Bit Reversal Algorithms in Parallel Computers , 2012, Int. J. Comput. Their Appl..

[52]  Geeta Nijhawan,et al.  ISOLATED SPEECH RECOGNITIONUSING MFCC AND DTW , 2013 .

[53]  Khalid A. Darabkh,et al.  A modified unsharp-masking technique for image contrast enhancement , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[54]  Utpal Bhattacharjee,et al.  A statistical analysis on the impact of noise on MFCC features for speech recognition , 2016, 2016 International Conference on Recent Advances and Innovations in Engineering (ICRAIE).

[55]  Ala 'khalifeh,et al.  An Open Source TCP / UDP-Based Network Probing Tool for Real-Time Packet Loss Estimation , 2015 .

[56]  Sandeep Kaur Mouse Movement using Speech and Non-Speech Characteristics of Human Voice , 2020 .

[57]  Khalid A. Darabkh,et al.  Incorporating automatic repeat request and thresholds with variable complexity decoding algorithms over wireless networks: queuing analysis , 2011, IET Commun..

[58]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[59]  Ching-Tang Hsieh,et al.  Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model , 2003, J. Inf. Sci. Eng..

[60]  Khalid A. Darabkh,et al.  An improved image least significant bit replacement method , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[61]  Jinjin Ye,et al.  Speech Recognition Using Time Domain Features from Phase Space Reconstructions , 2004 .

[62]  Jeff A. Bilmes,et al.  Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[63]  Weaam Alkhaldi,et al.  Multi-band based recognition of spoken Arabic numerals using wavelet transform , 2002, Proceedings of the Nineteenth National Radio Science Conference.

[64]  Khalid A. Darabkh,et al.  An efficient reversible data hiding algorithm using two steganographic images , 2016, Signal Process..

[65]  Khalid A. Darabkh,et al.  A new method for teaching microprocessors course using emulation , 2015, Comput. Appl. Eng. Educ..

[66]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[67]  Ellen Yi-Luen Do,et al.  Speech for the Disabled , 2011 .

[68]  Khalid A. Darabkh A New Steganographic Algorithm Based on Multi Directional PVD and Modified LSB , 2017, Inf. Technol. Control..

[69]  Khalid Saeed,et al.  Heuristic Method of Arabic Speech Recognition , 2005 .

[70]  Khalid A. Darabkh,et al.  Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language , 2013 .

[71]  W. Stokoe,et al.  Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[72]  D. F. Moores,et al.  Educating the Deaf: Psychology, Principles, and Practices , 2000 .

[73]  Gheith A. Abandah,et al.  Secure National Electronic Voting System , 2014, J. Inf. Sci. Eng..

[74]  Khalid A. Darabkh,et al.  A new efficient assembly language teaching aid for intel processors , 2015, Comput. Appl. Eng. Educ..