Phoneme-based or isolated-word modeling speech recognition system? An overview

In this paper speech theories and some methodological concerns about feature extraction and classification techniques widely used in speech recognition system are surveyed and discussed. The shortage of isolated word speech recognition is addressed as compared to its phoneme-based counterpart. This paper could be regarded as a very early stage towards methodology establishment in searching for better accuracy and less complexity system which has more generic properties. It is hoped that the system can classify speech regardless of the varieties across languages or accents. Speaker independency (SI) manner speech recognition system is required for this application and in fact, in many other potential applications as much as a telephonic network (large database consists of many different speakers) is a primary requirement. Isolated-word ASR for fixed vocabularies has been successfully implemented using HMM, ANN and SVM but suffers from lack of adaptability to other languages and increase in complexity as number of vocabularies increases. Conversely, phonemes, the smallest unit of human speech sounds are apparently more feasible to represent the basic building block for cross-language mapping. In fact, the phonetic transcription systems such as IPA and SAMPA are widely recognized and standardized for several languages in the world. This paper intends to investigate the phoneme-based potential as language independent phonetic units to overcome the lack of available training data so as to achieve a more generic speech recognizer.

[1]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[2]  Pascale Fung,et al.  Fast accent identification and accented speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  A. A. Beex,et al.  Automatic phoneme recognition with Segmental Hidden Markov Models , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[4]  Hossein Sameti,et al.  A novel approach to HMM-based speech recognition system using particle swarm optimization , 2009 .

[5]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Saeed Bagheri Shouraki,et al.  Recognition of human speech phonemes using a novel fuzzy approach , 2007, Appl. Soft Comput..

[7]  Ben P. Milner A comparison of front-end configurations for robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[9]  Paul Dalsgaard,et al.  Data-driven identification of poly- and mono-phonemes for four european languages , 1993, EUROSPEECH.

[10]  László Tóth,et al.  Application of Kernel-Based Feature Space Transformations and Learning Methods to Phoneme Classification , 2004, Applied Intelligence.

[11]  B. Venkataramani,et al.  FPGA Implementation of Support Vector Machine Based Isolated Digit Recognition System , 2009, 2009 22nd International Conference on VLSI Design.

[12]  Sadaoki Furui,et al.  Fifty years of progress in speech and speaker recognition , 2004 .

[13]  Lian-Hee Cheung Winnie H.Y. Wee,et al.  An Animated & Narrated Glossary of Terms Used in Linguistics , 2009 .

[14]  O.O. Khalifa,et al.  Human computer interaction using isolated-words speech recognition technology , 2007, 2007 International Conference on Intelligent and Advanced Systems.

[15]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[16]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[17]  Hao Tang,et al.  An initial attempt for phoneme recognition using Structured Support Vector Machine (SVM) , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  A. P. Kabilan,et al.  Speaker independent speech recognition system based on phoneme identification , 2008, 2008 International Conference on Computing, Communication and Networking.

[19]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[20]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[21]  S. J. Young Principles of Computer Speech , 1983 .

[22]  S. Gokcen,et al.  A multilingual phoneme and model set: toward a universal base for automatic speech recognition , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[23]  Fabio Violaro,et al.  An isolated-word speech recognition system using neural networks , 1995, 38th Midwest Symposium on Circuits and Systems. Proceedings.

[24]  W.D. Pan,et al.  A tutorial on using hidden Markov models for phoneme recognition , 2005, Proceedings of the Thirty-Seventh Southeastern Symposium on System Theory, 2005. SSST '05..

[25]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[26]  B. Venkataramani,et al.  Implementation of a phoneme recognition system using zero-crossing and magnitude sum function , 2009, TENCON 2009 - 2009 IEEE Region 10 Conference.

[27]  David W. Tempest,et al.  The Noise Handbook , 1985 .

[28]  Yasuo Ariki,et al.  Hierarchical phoneme discrimination by hidden Markov modelling using cepstrum and formant information , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[29]  M. A. Anusuya,et al.  Speech Recognition by Machine, A Review , 2010, ArXiv.

[30]  D.R. Reddy,et al.  Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[31]  F. Rosdi,et al.  Isolated malay speech recognition using Hidden Markov Models , 2008, 2008 International Conference on Computer and Communication Engineering.

[32]  Hossein Sameti,et al.  A novel approach to HMM-based speech recognition system using particle swarm optimization , 2009, 2009 Fourth International on Conference on Bio-Inspired Computing.

[33]  Rolando Carrasco,et al.  Real-time automatic speech recognition using HMM and neural networks , 1990, SBT/IEEE International Symposium on Telecommunications.

[34]  Hynek Hermansky,et al.  Phoneme recognition using spectral envelope and modulation frequency features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Christian Hacker,et al.  Revising Perceptual Linear Prediction (PLP) , 2005, INTERSPEECH.

[36]  Paul Dalsgaard,et al.  Identification of mono- and poly-phonemes using acoustic-phonetic features derived by a self-organising neural network , 1992, ICSLP.