The automatic segmentation of the vocal signal using predictive neural network

The automatic segmentation of the vocal signal precedes the features extraction stages, respectively the emotion recognition/classification. The extraction of the prosodic parameters as fundamental frequency (F0) and formants (F1-F4) cepstral coefficients LPCC and MFCC are made only on the vowel areas. The analysis tools from the SROL corpus are using a hybrid hierarchical system with four segmentation methods based on the autocorrelation function, AMDF method, the cepstral analysis and HPS method. Since the performance of this instrument has not been yet satisfactory, we analyzed other segmentation possibilities in order to obtain the best possible accuracy in segmentation. The predictive neural network used in this paper is in fact a simple perceptron which can approximate with high accuracy the quasi-periodic signals such as the vowels. The consonants have noisy properties and are complicated transition processes. The prediction error for the consonants comparing with the vowels is higher when it is used a sample neural network architecture.

[1]  Horia-Nicolai Teodorescu AI tools for speech analysis applied to the Romanian language , 2010 .

[2]  Dzulkifli Mohamad,et al.  Improved Statistical Speech Segmentation Using Connectionist Approach , 2009 .

[3]  Mohammed A. Al-Manie,et al.  Arabic Speech Segmentation: Automatic Verses Manual Method and Zero Crossing Measurements , 2010 .

[4]  Sacha Krstulovic,et al.  Automatic phonetic segmentation of Spanish emotional speech , 2007, INTERSPEECH.

[5]  Horia-Nicolai Teodorescu,et al.  Assessing the quality of voice synthesizers , 2009, 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue.

[6]  Witold Kinsner,et al.  Consonant characterization using correlation fractal dimension for speech recognition , 1995, IEEE WESCANEX 95. Communications, Power, and Computing. Conference Proceedings.

[7]  Thippur V. Sreenivas,et al.  Automatic speech segmentation using average level crossing rate information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  M. V. Koroteev,et al.  On chaotic nature of speech signals , 2008, 0812.4172.

[9]  Björn W. Schuller,et al.  Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach , 2010, Adv. Hum. Comput. Interact..

[10]  Olivier Rosec,et al.  A fusion approach for automatic speech segmentation of large corpora with application to speech synthesis , 2008, Speech Commun..

[11]  Adam J. Sporka Segmentation of Speech and Humming in Vocal Input , 2012 .

[12]  György Szaszák,et al.  Using prosody to improve automatic speech recognition , 2010, Speech Commun..