The development and evaluation of large vocabulary, speaker-independent continuous speech recognition systems are mainly done for the American English language. In this paper we present the work done to date in the development of an hybrid large vocabulary, speaker-independent continuous speech recognition system for the European Portuguese language. Due to the lack of a large appropriate speech and text database to be used in the development of that system we started collecting a large database and at the same time began developing a baseline system based on a smaller database. On this baseline system we applied techniques for automatic segmentation and labeling, in parallel with the development of a basic lexicon and language model for Portuguese. In the last part of this paper we also present the rst steps of our work over the new database.
[1]
Herman J. M. Steeneken,et al.
Multi-lingual assessment of speaker independent large vocabulary speech-recognition systems: THE SQALE-PROJECT
,
1995,
EUROSPEECH.
[2]
Ciro Martins,et al.
The design of a large vocabulary speech corpus for portuguese
,
1997,
EUROSPEECH.
[3]
Janet M. Baker,et al.
The Design for the Wall Street Journal-based CSR Corpus
,
1992,
HLT.
[4]
Ciro Martins,et al.
An incremental speaker-adaptation technique for hybrid HMM-MLP recognizer
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[5]
Hervé Bourlard,et al.
Connectionist Speech Recognition: A Hybrid Approach
,
1993
.