TC-STAR 2006 Automatic Speech Recognition Evaluation: The UVIGO System

This paper describes the ongoing development of the University of Vigo’s Automatic Speech Recognition system (UVIGO) for the automatic transcription of Spanish European Parliamentary Plenary sessions and Spanish Parliamentary sessions. The system was developed to partake in the 2006 TC-STAR Automatic Speech Recognition evaluation campaign in the Spanish language section. The UVIGO system was derived from the University of Vigo’s Galician Broadcast N ews (BN) Transcription system by adapting the BN acoustic and language models to the TC-STAR domain. A detailed discussion of the front-end processing, acoustic and language modelling, and decoding process are presented. The proposed decoding strategy was developed to make the best possible use of gender- and speakerdependent acoustic models without a prior gender or speaker identifica tion process. In addition to describing the system architecture and reporting the evaluation results, we also highlight further improvements tha t we are planning to make to the overall system.

[1]  Mark J. F. Gales,et al.  Porting: SwitchBoard to the VoiceMail task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[3]  Eduardo Rodríguez Banga,et al.  Combined prosody and candidate unit selections for corpus-based text-to-speech systems , 2002, INTERSPEECH.

[4]  Carmen García-Mateo,et al.  Fast LM look-ahead for large vocabulary continuous speech recognition using perfect hashing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  Carmen García-Mateo,et al.  Adaptation strategies for the acoustic and language models in bilingual speech transcription , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Fabio Brugnara,et al.  Improved automatic speech recognition through speaker normalization , 2006, Comput. Speech Lang..

[8]  Carmen García-Mateo,et al.  Transcrigal: A Bilingual System for Automatic Indexing of Broadcast News , 2004, LREC.

[9]  Carmen García-Mateo,et al.  Acoustic Modeling and Training of a Bilingual ASR System when a Minority Language is Involved , 2002, LREC.

[10]  Albino Nogueiras,et al.  The demiphone: an efficient subword unit for continuous speech recognition , 1997, EUROSPEECH.