暂无分享,去创建一个
Florian Metze | Desmond Elliott | Lucia Specia | Ramon Sanabria | Ozan Caglayan | Shruti Palaskar | Loïc Barrault | Lucia Specia | Loïc Barrault | Florian Metze | Desmond Elliott | Ozan Caglayan | Shruti Palaskar | Ramon Sanabria
[1] Christine D. Wilson,et al. Grounding conceptual knowledge in modality-specific systems , 2003, Trends in Cognitive Sciences.
[2] Ali Can Kocabiyikoglu,et al. Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation , 2018, LREC.
[3] Fethi Bougares,et al. NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems , 2017, Prague Bull. Math. Linguistics.
[4] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[5] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[6] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[7] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[8] Florian Metze,et al. Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach , 2016, INTERSPEECH.
[9] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[10] Rico Sennrich,et al. Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.
[11] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[12] Chin-Yew Lin,et al. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.
[13] Alexander G. Hauptmann,et al. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples , 2014, ACM Multimedia.
[14] Matt Post,et al. Improved speech-to-text translation with the Fisher and Callhome Spanish-English speech translation corpus , 2013, IWSLT.
[15] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[16] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[17] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Desmond Elliott,et al. Findings of the Third Shared Task on Multimodal Machine Translation , 2018, WMT.
[19] Christopher Joseph Pal,et al. Movie Description , 2016, International Journal of Computer Vision.
[20] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Yoni Bauduin,et al. Audio-Visual Speech Recognition , 2004 .
[22] Benjamin Van Durme,et al. Annotated Gigaword , 2012, AKBC-WEKEX@NAACL-HLT.
[23] Khalil Sima'an,et al. Multi30K: Multilingual English-German Image Descriptions , 2016, VL@ACL.
[24] Florian Metze,et al. End-to-end Multimodal Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Khalil Sima'an,et al. A Shared Task on Multimodal Machine Translation and Crosslingual Image Description , 2016, WMT.
[26] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Desmond Elliott,et al. Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description , 2017, WMT.
[28] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[29] Olivier Pietquin,et al. End-to-End Automatic Speech Translation of Audiobooks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[31] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[32] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[33] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Nazli Ikizler-Cinbis,et al. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures , 2016, J. Artif. Intell. Res..
[35] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[36] Navdeep Jaitly,et al. Sequence-to-Sequence Models Can Directly Translate Foreign Speech , 2017, INTERSPEECH.
[37] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[38] Florian Metze,et al. Visual features for context-aware speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Jindrich Libovický,et al. Attention Strategies for Multi-Source Sequence-to-Sequence Learning , 2017, ACL.
[40] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[41] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[42] Paul Clough,et al. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .
[43] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[44] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[45] Jeffrey P. Bigham,et al. Multimodal summarization of complex sentences , 2011, IUI '11.
[46] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[47] Nazli Ikizler-Cinbis,et al. TasvirEt: A benchmark dataset for automatic Turkish description generation from images , 2016, 2016 24th Signal Processing and Communication Application Conference (SIU).
[48] Haoran Li,et al. Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video , 2017, EMNLP.
[49] Xirong Li,et al. Adding Chinese Captions to Images , 2016, ICMR.
[50] Regina Barzilay,et al. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2017, ACL 2017.
[51] David L. Chen and William B. Dolan,et al. Building a Persistent Workforce on Mechanical Turk for Multilingual Data Collection , 2011 .
[52] Paul Over,et al. DUC in context , 2007, Inf. Process. Manag..
[53] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[54] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[55] Akikazu Takeuchi,et al. STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset , 2017, ACL.