Language independent automatic speech segmentation into phoneme-like units on the base of acoustic distinctive features

There are special topics in cognitive infocommunications where the processing of continuous speech is necessary. These topics often require the segmentation of speech signal into phoneme sized units. This kind of segmentation is necessary, when the desired behavior depends on speech timing, like rhythm or the place of voiced sounds (emotion or mood detection, language learning, acoustic feature visualization). Segmentation systems based on the acoustic-phonetic knowledge of speech could be realized in a language independent way. In this paper we introduce a language independent solution, based on the segmentation of continuous speech into 9 broad phonetic classes. The classification and segmentation was prepared using Hidden Markov Models. Three databases were used to evaluate the segmentation systems: Hungarian MRBA, German KIEL and English TIMIT databases. 80% average recognition result was obtained.

[1]  K. Vicsi,et al.  LIAS: language independent automatic segmentation tecbnique using SAMPA labelling of phonemes , 1998 .

[2]  Faran Awais Butt,et al.  Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals , 2013, 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE).

[3]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[4]  Kaoru Hirota,et al.  Journal of Advanced Computational Intelligence and Intelligent Informatics , 2014 .

[5]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[6]  Klára Vicsi,et al.  Recognition of Emotions on the Basis of Different Levels of Speech Segments , 2012, J. Adv. Comput. Intell. Intell. Informatics.

[7]  Klara Vicsi,et al.  Examination of the sensitivity of acoustic-phonetic parameters of speech to depression , 2012, 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom).

[8]  Cao Zhigang Automatic Phonetic Segmentation Using HMM Model , 2005 .

[9]  Buket D. Barkana,et al.  Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy , 2008, SCSS.

[10]  György Szaszák,et al.  Speech recognizer for preparing medical reports: Development experiences of a Hungarian speaker independent continuous speech recognizer , 2006 .

[11]  Gábor Kiss,et al.  Improving the Classification of Healthy and Pathological Continuous Speech , 2012, TSD.