A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks

We present a demonstration of a neural interactive-predictive system for tackling multimodal sequence to sequence tasks. The system generates text predictions to different sequence to sequence tasks: machine translation, image and video captioning. These predictions are revised by a human agent, who introduces corrections in the form of characters. The system reacts to each correction, providing alternative hypotheses, compelling with the feedback provided by the user. The final objective is to reduce the human effort required during this correction process. This system is implemented following a client-server architecture. For accessing the system, we developed a website, which communicates with the neural model, hosted in a local server. From this website, the different tasks can be tackled following the interactive–predictive framework. We open-source all the code developed for building this system. The demonstration in hosted in http://casmacat.prhlt.upv.es/interactive-seq2seq.

[1]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[4]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Benjamin Marie,et al.  Touch-Based Pre-Post-Editing of Machine Translation Output , 2015, EMNLP.

[7]  Jeffrey Heer,et al.  Predictive translation memory: a mixed-initiative system for human language translation , 2014, UIST.

[8]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[9]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[10]  Francisco Casacuberta,et al.  Online Learning for Effort Reduction in Interactive Neural Machine Translation , 2018, Comput. Speech Lang..

[11]  Christopher Joseph Pal,et al.  Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Ke Hu,et al.  A Comparative Study of Post-editing Guidelines , 2016, EAMT.

[13]  Philipp Koehn,et al.  Neural Interactive Translation Prediction , 2016, AMTA.

[14]  Francisco Casacuberta,et al.  Interactive neural machine translation , 2017, Comput. Speech Lang..

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Stefan Riezler,et al.  A user-study on online adaptation of neural machine translation to human post-edits , 2017, Machine Translation.

[17]  Cyrus Rashtchian,et al.  Cross-Caption Coreference Resolution for Automatic Image Understanding , 2010, CoNLL.

[18]  Markus H. Gross,et al.  A unified view of gradient-based attribution methods for Deep Neural Networks , 2017, NIPS 2017.

[19]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[20]  Germán Sanchis-Trilles,et al.  CASMACAT: An Open Source Workbench for Advanced Computer Aided Translation , 2013, Prague Bull. Math. Linguistics.

[21]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[22]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[23]  John DeNero,et al.  Models and Inference for Prefix-Constrained Machine Translation , 2016, ACL.

[24]  Hermann Ney,et al.  Statistical Approaches to Computer-Assisted Translation , 2009, CL.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[27]  William B. Dolan,et al.  Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.

[28]  Ana Guerberof Arenas Productivity and Quality in the Post-editing of Outputs from Translation Memories and Machine Translation , 2008 .

[29]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Philipp Koehn,et al.  The MateCat Tool , 2014, COLING.

[31]  Pierre Isabelle,et al.  Target-Text Mediated Interactive Machine Translation , 2004, Machine Translation.

[32]  Matthias Sperber,et al.  Low-Latency Neural Speech Translation , 2018, INTERSPEECH.

[33]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .