The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation

We present work that has been carried out in developing the ETSI extended DSR standards ES 202 211 and ES 202 212 (2003). These standards extend the previous ETSI DSR standards: basic front-end ES 201 108 and advanced (noise robust) front-end ES 202 050 respectively. The extensions enable enhanced tonal language recognition as well as server-side speech reconstruction capability. The paper discusses the client-side estimation of pitch and voicing class parameters whereas a companion paper discusses the server-side speech reconstruction. Experimental results show enhancement of tonal language recognition rates of proprietary recognition engines, when the standard extensions are used.

[1]  Daniel Waterman,et al.  Bounded variation in the mean , 2000 .

[2]  Michael Picheny,et al.  Robust methods for using context-dependent features and models in a continuous speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[4]  Haiping Li,et al.  Recognize tone languages using pitch information on the main vowel of each syllable , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Meir Tzur,et al.  Efficient periodicity extraction based on sine-wave representation and its application to pitch determination of speech signals , 2001, INTERSPEECH.

[6]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..