Speech recognition for huge vocabularies by using optimized sub-word units

This paper describes approaches for decomposing words of huge vocabularies (up to 2 million) into smaller particles that are suitable for a recognition lexicon. Results on a Finnish dictation task and a flat list of German street names are given.

[1]  Dietrich Klakow,et al.  OOV-detection in large vocabulary system using automatically defined word-fragments as fillers , 1999, EUROSPEECH.

[2]  Dietrich Klakow Language-model optimization by mapping of corpora , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Stephanie Seneff,et al.  The use of linguistic hierarchies in speech understanding , 1998, ICSLP.

[4]  James R. Glass,et al.  Heterogeneous lexical units for automatic speech recognition: preliminary investigations , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).