Open vocabulary ASR for audiovisual document indexation

The paper reports on an investigation of an open vocabulary recognizer that allows new words to be introduced in the recognition vocabulary, without the need to retrain or adapt the language model. This method uses special word classes, whose n-gram probabilities are estimated during the training process by discounting a mass of probability from the out of vocabulary words. A part-of-speech tagger is used to determine the word classes during language model training and for vocabulary adaptation. Metadata information provided by a French audiovisual archive institute are used to identify important document-specific missing words which are added to appropriate word classes in the system vocabulary. Pronunciations for the new words are derived by grapheme-to-phoneme conversion. On over 3 hours of broadcast news data, this approach leads to a reduction of 0.35% in the OOV rate, of 0.6% of the word error rate, with 80% of the occurrences of the newly introduced words being correctly recognized.