Bangla User Adaptive Word Speech Recognition: Approaches and Comparisons

The paper presents Bangla word speech recognition using two novel approaches with a comprehensive analysis. The first approach is based on spectral analysis and fuzzy logic and the second one uses Mel-Frequency Cepstral Coefficients (MFCC) analysis and feed-forward back-propagation neural networks. As human speech is imprecise and ambiguous, fuzzy logic – the base of which is indeed linguistic ambiguity, could serve as a precise tool for analyzing and recognizing human speech. The authors’ systems revolve around the visual representations of voiced signals – the Fourier energy spectrum and the MFCC. The essences of a Fourier energy spectrum and the MFCC are matrices that include information about properties of a sound by storing energy and frequency in discrete time. The decision making process of their systems is based on fuzzy logic and neural networks. Experimental results demonstrate that their fuzzy logic based system is 86% accurate whereas the Artificial Neural Networks (ANN) based system is 90% accurate compared to a commercial Hidden Markov Model (HMM) based speech recognizer that shows 73% accuracy on an average. Moreover, the authors’ research derives that, even though ANN gives a better recognition accuracy than the fuzzy logic based system, the fuzzy logic based system is more accurate when it comes to “more difficult” or “polysyllabic” words. In terms of runtime performance, the fuzzy logic based system outperforms the ANN based Bangla speech recognition system. Bangla User Adaptive Word Speech Recognition: Approaches and Comparisons

[1]  Victor C. X. Wang Handbook of Research on Scholarly Publishing and Research Methods , 2014 .

[2]  Z. Zenn Bien,et al.  Gesture Spotting Using Fuzzy Garbage Model and User Adaptation , 2011, Int. J. Fuzzy Syst. Appl..

[3]  Richard P. Bagozzi,et al.  The Second Generation of the Laddering Methodology and Its Use in Studying Decision Making , 2015 .

[4]  Christopher M. Conway,et al.  Two Distinct Sequence Learning Mechanisms for Syntax Acquisition and Word Learning , 2013 .

[5]  Toly Chen A PCA-FBPN Approach for Job Cycle Time Estimation in a Wafer Fabrication Factory , 2012, Int. J. Fuzzy Syst. Appl..

[6]  Mumit Khan,et al.  Isolated and continuous bangla speech recognition: implementation, performance and application perspective , 2007 .

[7]  Michael Mutingi,et al.  Fuzzy System Dynamics: An Application to Supply Chain Management , 2014 .

[8]  Joos-Hendrik Böse,et al.  Reducing Blocking Risks of Atomic Transactions in MANETs Using a Backup Coordinator , 2010, Int. J. Ambient Comput. Intell..

[9]  Sean W. Mulvenon,et al.  Growth Models in the Age of School Reform and Accountability: Policy, Practice, and Implications , 2015 .

[10]  Hermann Ney,et al.  Computing Mel-frequency cepstral coefficients on the power spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Fathi E. Abd El-Samie,et al.  Design and Implementation of a Fast General Purpose Fuzzy Processor , 2013, Int. J. Syst. Dyn. Appl..

[12]  Alberto Jaspe Villanueva,et al.  IA Algorithm Acceleration Using GPUs , 2009, Encyclopedia of Artificial Intelligence.

[13]  Jürgen Perl,et al.  Neural Network-Based Process Analysis in Sport , 2009, Encyclopedia of Artificial Intelligence.

[14]  Bernd Plannerer,et al.  An Introduction to Speech Recognition , 2005 .

[15]  Alejandro Pazos Sierra,et al.  Encyclopedia of Artificial Intelligence , 2008 .

[16]  Catherine A. Hansman Navigators on the Research Path: Teaching and Mentoring Student Qualitative Researchers , 2015 .

[17]  Richard W. Schwester Teaching Research Methods in Public Administration , 2015 .

[18]  Pandian Vasant,et al.  Hybrid Evolutionary Optimization Algorithms: A Case Study in Manufacturing Industry , 2014 .

[19]  Debora Cheney Big Data, Text Mining, and News Content: Where is the Big Data? , 2015 .

[20]  Philip J. Salem The Use of Mixed Methods in Organizational Communication Research , 2013 .

[21]  Joost van Hoof,et al.  Telehomecare in The Netherlands: Barriers to Implementation , 2012, Int. J. Ambient Comput. Intell..

[22]  Sumit Kumar,et al.  Rule Optimization of Web-Logs Data Using Evolutionary Technique , 2015 .

[23]  H. Fletcher The nature of speech and its interpretation , 1922 .

[24]  William J. Gibbs,et al.  Contemporary Research Methods and Data Analytics in the News Industry , 2015 .

[25]  Daniel Rivero,et al.  ANN Development with EC Tools: An Overview , 2009 .

[26]  Yeesock Kim,et al.  Comparative Study on Multi-Objective Genetic Algorithms for Seismic Response Controls of Structures , 2013 .

[27]  Stuart Schwartz Transformational Content and Relationships: Research, Analytical Tools, and Big Data in Shaping the News User Experience (UX) , 2015 .

[28]  K. Davis,et al.  Automatic Recognition of Spoken Digits , 1952 .

[29]  Z. Zenn Bien,et al.  Hand Gesture Recognition Using Multivariate Fuzzy Decision Tree and User Adaptation , 2011, Int. J. Fuzzy Syst. Appl..

[30]  Kakali Bhattacharya,et al.  Practical Wisdom of Tool and Task: Meeting the Demands of the Method with Digital Tools in Qualitatively Driven Mixed Methods Studies , 2015 .

[31]  Gregg Bernstein,et al.  Research Methodologies, Data Collection, and Analysis at MailChimp: A Case Study , 2015 .