Fuzzy-based algorithm for Fongbe continuous speech segmentation

Text-independent speech segmentation is a challenging topic in computer-based speech recognition systems. This paper proposes a novel time-domain algorithm based on fuzzy knowledge for continuous speech segmentation task via a nonlinear speech analysis. Short-term energy, zero-crossing rate and the singularity exponents are the time-domain features that we have calculated in each point of speech signal in order to exploit relevant information for generating the significant segments. This is down for the phoneme or syllable identification and the transition fronts. Fuzzy logic technique helped us to fuzzify the calculated features into three complementary sets namely: low, medium, high and to perform a matching phase using a set of fuzzy rules. The outputs of our proposed algorithm are silence, phonemes, or syllables. Once evaluated, our algorithm produced the best performances with efficient results on Fongbe language (an African tonal language spoken especially in Benin, Togo and Nigeria).

[1]  Axel Röbel,et al.  Syll-O-Matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  S. B. Patil,et al.  Zero crossing rate and Energy of the Speech Signal of Devanagari Script , 2014 .

[3]  Rajesh M. Hegde,et al.  Segmentation of speech into syllable-like units , 2003, INTERSPEECH.

[4]  Hsin-Min Wang,et al.  Phonetic Boundary Refinement using Support Vector Machine , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[6]  Unto K. Laine,et al.  An improved speech segmentation quality measure: the r-value , 2009, INTERSPEECH.

[7]  Antonio Turiel,et al.  Numerical methods for the estimation of multifractal singularity spectra on sampled data: A comparative study , 2006, J. Comput. Phys..

[8]  Mu-Chun Su,et al.  A Segmentation Method for Continuous Speech Utilizing Hybrid Neuro-Fuzzy Network , 1999, J. Inf. Sci. Eng..

[9]  L. Shastri,et al.  SYLLABLE DETECTION AND SEGMENTATION USING TEMPORAL FLOW NEURAL NETWORKS , 1999 .

[10]  Beng T. Tan,et al.  Applying wavelet analysis to speech segmentation and classification , 1994, Defense, Security, and Sensing.

[11]  Mijanur Rahman,et al.  Continuous Bangla Speech Segmentation using Short-term Speech Features Extraction Approaches , 2012 .

[12]  S. Matsushita,et al.  Languages of Africa , 1981 .

[13]  Cina Motamed,et al.  Weighted Combination of Naive Bayes and LVQ Classifier for Fongbe Phoneme Classification , 2014, 2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems.

[14]  Suresh Manandhar,et al.  Phoneme Segmentation Based on Wavelet Spectra Analysis , 2011 .

[15]  Günther Ruske,et al.  Syllable segmentation of continuous speech with artificial neural networks , 1993, EUROSPEECH.

[16]  Buket D. Barkana,et al.  Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy , 2008, SCSS.

[17]  Nikos Fakotakis,et al.  Speech segmentation using regression fusion of boundary predictions , 2010, Comput. Speech Lang..

[18]  H. H,et al.  THE LANGUAGES OF AFRICA. , 1884, Science.

[19]  Van Zyl van Vuuren,et al.  Unconstrained Speech Segmentation using Deep Neural Networks , 2015, ICPRAM.

[20]  Cina Motamed,et al.  Adaptive decision-level fusion for Fongbe phoneme classification using fuzzy logic and Deep Belief Networks , 2015, 2015 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO).

[21]  Khalid Daoudi,et al.  Improving text-independent phonetic segmentation based on the Microcanonical Multiscale Formalism , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Nozomu Hamada,et al.  Voice activity detection with array signal processing in the wavelet domain , 2002, 2002 11th European Signal Processing Conference.

[23]  Farshad Almasganj,et al.  Segmentation of speech into syllable units using fuzzy smoothed short term energy contour , 2011, 2011 18th Iranian Conference of Biomedical Engineering (ICBME).

[24]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[25]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26]  N. Parga,et al.  Ju l 2 00 1 The multi-fractal structure of contrast changes in natural images : from sharp edges to textures , 2008 .

[27]  Tobi Delbruck,et al.  Real-time classification and sensor fusion with a spiking deep belief network , 2013, Front. Neurosci..

[28]  Shen Junwei Speech denoising and syllable segmentation based on fractal dimension , 2011 .