Assessing the Performance of Automatic Speech Recognition Systems When Used by Native and Non-Native Speakers of Three Major Languages in Dictation Workflows

In this paper, we report on a two-part experiment aiming to assess and compare the performance of two types of automatic speech recognition (ASR) systems on two different computational platforms when used to augment dictation workflows. The experiment was performed with a sample of speakers of three major languages and with different linguistic profiles: non-native English speakers; non-native French speakers; and native Spanish speakers. The main objective of this experiment is to examine ASR performance in translation dictation (TD) and medical dictation (MD) workflows without manual transcription vs. with transcription. We discuss the advantages and drawbacks of a particular ASR approach in different computational platforms when used by various speakers of a given language, who may have different accents and levels of proficiency in that language, and who may have different levels of competence and experience dictating large volumes of text, and with ASR technology. Lastly, we enumerate several areas for future research.

[1]  Luciana Graziuso,et al.  Translation as a profession , 2013 .

[2]  Philip C. Woodland,et al.  An investigation into vocal tract length normalisation , 1999, EUROSPEECH.

[3]  J. Jacko,et al.  The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .

[4]  R. Pausch An Empirical Study : Adding Voice Input to a Graphical Editor , 1991 .

[5]  Marc Dymetman,et al.  Towards an automatic dictation system for translators : the transtalk project , 1994, ICSLP.

[6]  Kasper Hornbæk,et al.  Current practice in measuring usability: Challenges to usability studies and research , 2006, Int. J. Hum. Comput. Stud..

[7]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[8]  Michael Carl,et al.  Translog-II: a Program for Recording User Activity Data for Empirical Reading and Writing Research , 2012, LREC.

[9]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[10]  Bartolomé Mesa-Lao,et al.  Speech-Enabled Computer-Aided Translation: A Satisfaction Survey with Post-Editor Trainees , 2014, HaCaT@EACL.

[11]  Lynne Bowker,et al.  Computer-Aided Translation Technology: A Practical Introduction , 2002 .

[12]  Pablo Romero-Fresco,et al.  Subtitling Through Speech Recognition: Respeaking , 2014 .

[13]  Richard C. Rose,et al.  Integration of Statistical Models for Dictation of Document Translations in a Machine-Aided Human Translation Task , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Julián Zapata,et al.  Exploring multimodality for translator-computer interaction , 2014, ICMI.

[15]  Philip C. Woodland Speaker adaptation for continuous density HMMs: a review , 2001 .

[16]  Roland Kuhn,et al.  French speech recognition in an automatic dictation system for translators: the transtalk project , 1995, EUROSPEECH.

[17]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[18]  Dragoș Ciobanu,et al.  Of Dragons and Speech Recognition Wizards and Apprentices , 2014 .

[19]  Richard C. Rose,et al.  Efficient integration of translation and speech models in dictation based machine aided human translation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Alexander I. Rudnicky,et al.  A Comparison of Speech and Typed Input , 1990, HLT.

[21]  Alexander Gruenstein,et al.  Accurate and compact large vocabulary speech recognition on mobile devices , 2013, INTERSPEECH.

[22]  Arnt Lykke Jakobsen,et al.  Sound effects in translation , 2013 .

[23]  Barbara Dragsted,et al.  Speaking your translation : students ’ first encounter with speech recognition technology , 2011 .

[24]  M. A. Anusuya,et al.  SEECAT: ASR & Eye-tracking enabled computer-assisted translation , 2014, EAMT.

[25]  Jean-François Lapointe,et al.  Evaluating productivity gains of hybrid ASR-MT systems for translation dictation , 2008, IWSLT.

[26]  J. Rojas Traduction dictée interactive : intégrer la reconnaissance vocale à l’enseignement et à la pratique de la traduction professionnelle , 2012 .

[27]  Alexander H. Waibel,et al.  Multimodal interfaces , 1996, Artificial Intelligence Review.

[28]  Francisco Casacuberta,et al.  Computer-assisted translation using speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.