Communication devices which perform distributed speech recognition (DSR) tasks currently transmit standardized coded parameters of speech signals. Recognition features are extracted from signals reconstructed using these on a remote server. Since reconstruction losses degrade recognition performance, proposals are being considered to standardize DSR-codecs which derive recognition features, to be transmitted and used directly for recognition. However, such a codec must be embedded on the transmitting device, along with its current standard codec. Performing recognition using codec bitstreams avoids these complications: no additional feature-extraction mechanism is required on the device, and there are no reconstruction losses on the server. We propose an LDA-based method for extracting optimal feature sets from codec bitstreams and demonstrate that features so derived result in improved recognition performance for the LPC, GSM and CELP codecs. For GSM and CELP, we show that the performance is comparable to that with uncoded speech and standard DSR-codec features.
[1]
David G. Stork,et al.
Pattern Classification
,
1973
.
[2]
Seung Ho Choi,et al.
Speech recognition method using quantised LSP parameters in CELP-type coders
,
1998
.
[3]
Richard M. Stern,et al.
Speech recognition from GSM codec parameters
,
1998,
ICSLP.
[4]
Kuldip K. Paliwal,et al.
Effect of speech coders on speech recognition performance
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[5]
Hong Kook Kim,et al.
Bitstream-based feature extraction for wireless speech recognition
,
2000,
2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).