Experimenting Text Creation by Natural-Language, Large-Vocabulary Speech Recognition

In the last years the probabilistic approach to speech recognition has allowed the development of high-performances large-vocabulary speech recognition systems [1] [2]. At the IBM Rome Scientific Center a speech-recognition prototype for the Italian language, based on this approach, has been built. The prototype is able to recognize in real time natural-language sentences built using a vocabulary containing up to 20000 words. [4]. Once and for all the user has to perform an acoustic training phase (about 20 minutes long), during which he is required to utter a predefined text. Words must be uttered inserting small pauses (a few centiseconds), between them. The prototype architecture is based on a personal computer equipped with special hardware. The first system we developed was aimed at a business and finance lexicon. Many laboratory tests have shown the effectiveness of the prototype as a tool to create texts by voice. After a first phase during which in-house experiments were carried on [5], the need arose to test the system in real work enviroments and for different applications. Two applications were considered: the dictation of radiological reports and of insurance company documents. Due to their characteristics, these applications seemed to be very well suited for our purposes. Since the vocabulary of the recognizer must be predefined, we had to adapt the system to the lexicon required by the new applications. The paper describes the techniques developed to efficiently adapt the basic component of the recognizer the acoustic and language models. The results obtained experimenting automatic text dictation during real work are also presented.