The SSI large-vocabulary speaker-independent continuous speech recognition system

The Speech Systems Incorporated (SSI) commercial, large-vocabulary, speaker-independent, continuous speech recognition system is described. The system utilizes a novel approach to speech representation: a two-stage encoding of speech, with an intervening compression of acoustic frames (segmentation) between the encoding stages, and a linguistic decoding process suitable for large, variable-duration segments. Binary decision trees trained using the maximum mutual information (MMI) criterion serve as encoders. The features used in encoding are listed, and their ability to discriminate the phonetic content of the speech is analyzed. Recognition results are given for a speaker-independent continuous speech, grammar-constrained radiology reporting product, and for an isolated-word grammar of high perplexity.<<ETX>>