Fuzzy decision fusion of complementary experts based on evolutionary cepstral coefficients for phoneme recognition

Optimal representation of acoustic features is an ongoing challenge in automatic speech recognition research. As an initial step toward this purpose, optimization of filterbanks for the cepstral coefficient using evolutionary optimization methods is proposed in some approaches. However, the large number of optimization parameters required by a filterbank makes it difficult to guarantee that an individual optimized filterbank can provide the best representation for phoneme classification. Moreover, in many cases, a number of potential solutions are obtained. Each solution presents discrimination between specific groups of phonemes. In other words, each filterbank has its own particular advantage. Therefore, the aggregation of the discriminative information provided by filterbanks is demanding challenging task. In this study, the optimization of a number of complementary filterbanks is considered to provide a different representation of speech signals for phoneme classification using the hidden Markov model (HMM). Fuzzy information fusion is used to aggregate the decisions provided by HMMs. Fuzzy theory can effectively handle the uncertainties of classifiers trained with different representations of speech data. In this study, the output of the HMM classifiers of each expert is fused using a fuzzy decision fusion scheme. The decision fusion employed a global and local confidence measurement to formulate the reliability of each classifier based on both the global and local context when making overall decisions. Experiments were conducted based on clean and noisy phonetic samples. The proposed method outperformed conventional Mel frequency cepstral coefficients under both conditions in terms of overall phoneme classification accuracy. The fuzzy fusion scheme was shown to be capable of the aggregation of complementary information provided by each filterbank.

[1]  Alex M. Andrew Practical Motion Planning in Robots: Current Approaches and Future Directions, edited by Kamal Gupta and Angel P. del Pobil, Wiley, Chichester, 1998, xi+356 pp., ISBN 0-471-98163-X (Hardback, £65.00) , 1999, Robotica.

[2]  Patrick Wambacq,et al.  Improved feature decorrelation for HMM-based speech recognition , 1998, ICSLP.

[3]  John G. Harris,et al.  Improving the filter bank of a classic speech feature extraction algorithm , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[4]  Biing-Hwang Juang,et al.  An application of discriminative feature extraction to filter-bank-based speech recognition , 2001, IEEE Trans. Speech Audio Process..

[5]  Marko Janev,et al.  Image denoising by a direct variational minimization , 2011, EURASIP J. Adv. Signal Process..

[6]  S.M. Ahadi,et al.  Weighting of Mel Sub-bands Based on SNR/Entropy for Robust ASR , 2008, 2008 IEEE International Symposium on Signal Processing and Information Technology.

[7]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[9]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[11]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[12]  T. V. Ananthapadmanabha,et al.  Calculation of true glottal flow and its components , 1982, Speech Commun..

[13]  Brad H. Story,et al.  An overview of the physiology, physics and modeling of the sound source for vowels , 2002 .

[14]  Oh-Wook Kwon,et al.  Phoneme recognition using ICA-based feature extraction and transformation , 2004, Signal Process..

[15]  Mohamed Chetouani,et al.  Multi Filter Bank Approach for Speaker Verification Based on Genetic Algorithm , 2007, NOLISP.

[16]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[17]  Hynek Hermansky,et al.  Data Driven Design of Filter Bank for Speech Recognition , 2001, TSD.

[18]  Diego H. Milone,et al.  Evolutionary Splines for Cepstral Filterbank Optimization in Phoneme Classification , 2011, EURASIP J. Adv. Signal Process..

[19]  Richard Lippmann,et al.  A comparison of signal processing front ends for automatic word recognition , 1995, IEEE Trans. Speech Audio Process..

[20]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[21]  Fabio Dell'Acqua,et al.  Comparison and combination of multiband classifiers for landsat urban land cover mapping , 2005, Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS '05..

[22]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[23]  Diego H. Milone,et al.  Evolutionary cepstral coefficients , 2011, Appl. Soft Comput..

[24]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[25]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[26]  Alex M. Andrew ROBOCUP-97: Robot Soccer World Cup 1, edited by Hiroaki Kitano, Lecture Notes in Computer Science Series No 1395, Springer, Berlin, 1998, xiv+520 pp., ISBN 3-540-64473-3 (Softcover, £34.00 or $54.00) , 1999, Robotica.

[27]  Jon Atli Benediktsson,et al.  Consensus theoretic classification methods , 1992, IEEE Trans. Syst. Man Cybern..

[28]  Isabelle Bloch Information combination operators for data fusion: a comparative review with classification , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[29]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[30]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[31]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[32]  Hynek Hermansky,et al.  Temporal envelope compensation for robust phoneme recognition using modulation spectrum. , 2010, The Journal of the Acoustical Society of America.

[33]  Ixone Arroabarren,et al.  Voice Production Mechanisms of Vocal Vibrato in Male Singers , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Jon Atli Benediktsson,et al.  Decision Fusion for the Classification of Urban Remote Sensing Images , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[35]  John G. Harris,et al.  Increased mfcc filter bandwidth for noise-robust phoneme recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  Mandar N. Kakade,et al.  Speech Identification using MFCC Algorithm on Arm Platform , 2012 .

[37]  Jon Atli Benediktsson,et al.  Classification of multisource and hyperspectral data based on decision fusion , 1999, IEEE Trans. Geosci. Remote. Sens..

[38]  John Holland,et al.  Adaptation in Natural and Artificial Sys-tems: An Introductory Analysis with Applications to Biology , 1975 .

[39]  Sam Kwong,et al.  Genetic algorithms and their applications , 1996, IEEE Signal Process. Mag..

[40]  Etienne E. Kerre,et al.  Defuzzification: criteria and classification , 1999, Fuzzy Sets Syst..

[41]  F. Dell'Acqua,et al.  A comparison of fuzzy and neuro-fuzzy data fusion for flooded area mapping using SAR images , 2004 .

[42]  Gianpaolo Evangelista,et al.  Adaptive bands filter bank optimized by genetic algorithm for robust speech recognition system , 2011 .

[43]  Mayank Dave,et al.  Filterbank optimization for robust ASR using GA and PSO , 2012, Int. J. Speech Technol..

[44]  Johannes R. Sveinsson,et al.  Hybrid consensus theoretic classification , 1996, IGARSS '96. 1996 International Geoscience and Remote Sensing Symposium.

[45]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[46]  Mohamed Chetouani,et al.  Optimizing feature complementarity by evolution strategy: Application to automatic speaker verification , 2009, Speech Commun..

[47]  Diego H. Milone,et al.  Parallel implementation for wavelet dictionary optimization applied to pattern recognition , 2006 .