Segmentation of Sindhi Speech using Formants

A speech segmentation method using formant frequencies is presented. The method uses speech samples of a major language of Indian sub-continent, Sindhi. It performs VCP (vowel-consonant-pause) segmentation and generates VCP strings for speech signals. The VCP strings and their formation may enable a recognizer to identify the speech on-the-fly, hence minimizing the system training and making the recognizer very efficient. The method applies velocity and acceleration parameters of rate-of-change dynamics on formants of speech to segment it into vowel, consonant, and pause parts. A test-bed software, to implement the proposed method and conduct all experiments, is also presented. Results show that the method is speaker as well as gender independent. Its segmentation performance is almost over 90% in most conditions and over 60% under some worst conditions. Long-term goal is to develop an efficient speaker-independent speech recognizer based on proposed method. A model of such a recognizer is also presented.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  M.A. Khawaja Acoustic Analysis of Phonetics of Arabic Script Sindhi Language to evaluate Vowel-Consonant Segmentation , 2004 .

[3]  Etienne Barnard,et al.  Explicit, N-best formant features for vowel classification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  E. Vajda Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet , 2000 .

[5]  Hema A. Murthy,et al.  Automatic segmentation of continuous speech using minimum phase group delay functions , 2004, Speech Commun..

[6]  Samy Bengio,et al.  Evaluation of formant-like features on an automatic vowel classification task. , 2004, The Journal of the Acoustical Society of America.

[7]  P. Korhonen,et al.  Unsupervised Segmentation of Continuous Speech Using Vectorautoregressive Modeling , 2022 .

[8]  Lie Lu,et al.  Speech segmentation without speech recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[10]  A. P. Memon Study of Unicode specifications and their implementation in Arabic script languages by designing a multilingual Unicode editor , 2001, Proceedings. IEEE International Multi Topic Conference, 2001. IEEE INMIC 2001. Technology for the 21st Century..

[11]  Anna Esposito,et al.  A new text-independent method for phoneme segmentation , 2001, Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems. MWSCAS 2001 (Cat. No.01CH37257).

[12]  Richard P. Lippmann,et al.  A neural net approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Najmi Ghani Haider,et al.  A digital neural network approach to speech recognition , 1989 .