Automatic Subtitling of the Basque Parliament Plenary Sessions Videos

Subtitling of video contents offered in the web by Spanish administration agencies is required by law for allowing people with hearing impairments to follow them. The automatic bilingual video subtitling system described in this paper has been applied on the plenary sessions videos that the Basque Parliament posts in its web (http://www.parlamentovasco.euskolegebiltzarra.org/), and is running from September 2010. A specific characteristic of this system is the use of a simple phonetic decoder based on a joint selection of Basque and Spanish phone models, since it is not unusual for parliamentarians to make use of a mixing of the two languages. The system uses the manually transcribed Session Diaries (almost verbatim but containing some errors) as subtitles, synchronizing text and audio by means of an acoustic decoder, a multilingual orthographic-phonetic transcriber and a very-large-symbol-sequence aligner.1

[1]  Pedro J. Moreno,et al.  A recursive algorithm for the forced alignment of very long audio segments , 1998, ICSLP.

[2]  José B. Mariño,et al.  Albayzin speech database: design of the phonetic corpus , 1993, EUROSPEECH.

[3]  Dominique Fohr,et al.  JTrans: an open-source software for semi-automatic text-to-speech alignment , 2009, INTERSPEECH.

[4]  Patrick Cardinal,et al.  Computer-assisted closed-captioning of live TV broadcasts in French , 2006, INTERSPEECH.

[5]  Ciro Martins,et al.  Broadcast news subtitling system in Portuguese , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Pedro J. Moreno,et al.  A factor automaton approach for the forced alignment of long speech recordings , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Timothy J. Hazen Automatic alignment and error correction of human generated transcripts for long speech recordings , 2006, INTERSPEECH.

[8]  Arantza del Pozo,et al.  APyCA: Towards the automatic subtitling of television content in Spanish , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[9]  Eduardo Lleida,et al.  Real-time live broadcast news subtitling system for Spanish , 2009, INTERSPEECH.

[10]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[11]  M. Penagarikano,et al.  Sautrela: a highly modular open source speech recognition framework , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..