Automatic boundary detection based on entropy measures for text-independent syllable segmentation

In this paper, we study the boundary detection in syllable segmentation field. We describe an algorithm proposed for text-independent syllable segmentation. This algorithm provides a performance comparison between the entropies of Shannon, Tsallis and Renyi in an effective detection of beginning-ending points of syllable in a speech signal. The Shannon generalizations (Tsallis and Renyi) quantify the degree of signal organization and offer the relevant information such as the voicing degree on the first syllable segment that we obtained from the temporal dynamics of singularity exponents. The method we propose is focused on an aggregation measure based on entropies to enhance the syllable boundaries detection. It has been also demonstrated in this paper that the best suited entropy for efficient boundary detection is Renyi entropy. Once evaluated, our algorithm produced better performance with efficient results on two languages, i.e., the Fongbe (an African tonal language spoken especially in Benin, Togo, and Nigeria) and an American English. The overall accuracy of syllable boundaries was obtained on Fongbe dataset and validated subsequently on TIMIT dataset with a margin of error < 5ms.

[1]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[2]  Andrew Wilson Howitt,et al.  Vowel landmark detection , 1999, EUROSPEECH.

[3]  L. Shastri,et al.  SYLLABLE DETECTION AND SEGMENTATION USING TEMPORAL FLOW NEURAL NETWORKS , 1999 .

[4]  Unto K. Laine,et al.  An improved speech segmentation quality measure: the r-value , 2009, INTERSPEECH.

[5]  Xufang Zhao,et al.  A new hybrid approach for automatic speech signal segmentation using silence signal detection, energy convex hull, and spectral variation , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[6]  Rudi C. Villing,et al.  Automatic Blind Syllable Segmentation for Continuous Speech , 2004 .

[7]  Van Zyl van Vuuren,et al.  Unconstrained Speech Segmentation using Deep Neural Networks , 2015, ICPRAM.

[8]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[9]  Olivier J. J. Michel,et al.  Measuring time-Frequency information content using the Rényi entropies , 2001, IEEE Trans. Inf. Theory.

[10]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[11]  N. Parga,et al.  Ju l 2 00 1 The multi-fractal structure of contrast changes in natural images : from sharp edges to textures , 2008 .

[12]  Antonio Turiel,et al.  Numerical methods for the estimation of multifractal singularity spectra on sampled data: A comparative study , 2006, J. Comput. Phys..

[13]  Oriol Pont,et al.  An Optimized Algorithm for the Evaluation of Local Singularity Exponents in Digital Signals , 2011, IWCIA.

[14]  Francesco Cutugno,et al.  A syllable segmentation algorithm for English and italian , 2003, INTERSPEECH.

[15]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Chih-Hsun Chou,et al.  On the Studies of Syllable Segmentation and Improving MFCCs for Automatic Birdsong Recognition , 2008, 2008 IEEE Asia-Pacific Services Computing Conference.

[17]  A. Rényi On Measures of Entropy and Information , 1961 .

[18]  Shen Junwei Speech denoising and syllable segmentation based on fractal dimension , 2011 .

[19]  Khalid Daoudi,et al.  Phonetic segmentation of speech signal using local singularity analysis , 2014, Digit. Signal Process..

[20]  Sergios Theodoridis,et al.  An Overview of Speech/Music Discrimination Techniques in the Context of Audio Recordings , 2008 .

[21]  Tanee Demeechai,et al.  Recognition of syllables in a tone language , 2001, Speech Commun..

[22]  Boualem Boashash,et al.  Time-Frequency Signal Analysis and Processing: A Comprehensive Reference , 2015 .

[23]  Steven Greenberg,et al.  Integrating syllable boundary information into speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Chai Wutiwiwatchai,et al.  Thai syllable segmentation for connected speech based on energy , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[25]  Khalid Daoudi,et al.  Improving text-independent phonetic segmentation based on the Microcanonical Multiscale Formalism , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Antonio Origlia,et al.  Continuous emotion recognition with phonetic syllables , 2014, Speech Commun..

[27]  Farshad Almasganj,et al.  Segmentation of speech into syllable units using fuzzy smoothed short term energy contour , 2011, 2011 18th Iranian Conference of Biomedical Engineering (ICBME).

[28]  Witold Kinsner,et al.  Speech segmentation using multifractal measures and amplification of signal features , 2008, 2008 7th IEEE International Conference on Cognitive Informatics.

[29]  Kuruvachan K. George,et al.  Spectral matching based voice activity detector for improved speaker recognition , 2014, 2014 International Conference on Power Signals Control and Computations (EPSCICON).

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  Hema A. Murthy,et al.  Automatic segmentation of continuous speech using minimum phase group delay functions , 2004, Speech Commun..

[32]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[33]  Hussein M. Yahia,et al.  Motion analysis in oceanographic satellite images using multiscale methods and the energy cascade , 2010, Pattern Recognit..

[34]  Mu-Chun Su,et al.  A Segmentation Method for Continuous Speech Utilizing Hybrid Neuro-Fuzzy Network , 1999, J. Inf. Sci. Eng..

[35]  Xuanjing Huang,et al.  Long Short-Term Memory Neural Networks for Chinese Word Segmentation , 2015, EMNLP.

[36]  Carlos Dias Maciel,et al.  A Fractal-Based Approach for Speech Segmentation , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[37]  Susanne Burger,et al.  Syllable detection in read and spontaneous speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[38]  Axel Röbel,et al.  Syll-O-Matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Jong-Myon Kim,et al.  An enhanced fuzzy c-means algorithm for audio segmentation and classification , 2011, Multimedia Tools and Applications.

[40]  Björn W. Schuller,et al.  Syllabification of conversational speech using Bidirectional Long-Short-Term Memory Neural Networks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).