Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
暂无分享,去创建一个
Xin Wang | Sugato Basu | Pradyumna Narayana | Tsu-Jui Fu | An Yan | Kazoo Sone | Wanrong Zhu | William Yang Wang | Sugato Basu | Tsu-Jui Fu | W. Wang | Wanrong Zhu | P. Narayana | Kazoo Sone | An Yan | X. Wang
[1] Kevin Barraclough,et al. I and i , 2001, BMJ : British Medical Journal.
[2] William Yang Wang,et al. Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation , 2020, FINDINGS.
[3] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[4] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Jiancheng Lv,et al. TIGS: An Inference Algorithm for Text Infilling with Gradient Search , 2019, ACL.
[7] W. Marsden. I and J , 2012 .
[8] Ruslan Salakhutdinov,et al. Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.
[9] Ashish Vaswani,et al. Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation , 2019, ACL.
[10] Eric P. Xing,et al. Text Infilling , 2019, ArXiv.
[11] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.
[12] Ewan Klein,et al. Natural Language Processing with Python , 2009 .
[13] Ghassan Al-Regib,et al. The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[15] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[16] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[17] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[18] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[19] Xin Wang,et al. Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation , 2018, ECCV.
[20] Ruslan Salakhutdinov,et al. Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.
[21] Leon A. Gatys,et al. A Neural Algorithm of Artistic Style , 2015, ArXiv.
[22] Raia Hadsell,et al. Learning to Navigate in Cities Without a Map , 2018, NeurIPS.
[23] 이화영. X , 1960, Chinese Plants Names Index 2000-2009.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Zaixiang Zheng,et al. Learning to Discriminate Noises for Incorporating External Information in Neural Machine Translation , 2018, ArXiv.
[26] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Harsh Mehta,et al. VALAN: Vision and Language Agent Navigation , 2019, ArXiv.
[28] Eric P. Xing,et al. Unsupervised Text Style Transfer using Language Models as Discriminators , 2018, NeurIPS.
[29] Xin Wang,et al. Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling , 2020, ECCV.
[30] Yijie Xu,et al. Cross-Domain Image Classification through Neural-Style Transfer Data Augmentation , 2019, ArXiv.
[31] Hao Tan,et al. Diagnosing the Environment Bias in Vision-and-Language Navigation , 2020, IJCAI.
[32] Yu Cheng,et al. INSET: Sentence Infilling with INter-SEntential Transformer , 2019, ACL.
[33] Arjun Majumdar,et al. Improving Vision-and-Language Navigation with Image-Text Pairs from the Web , 2020, ECCV.
[34] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[35] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[36] William Yang Wang,et al. Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning , 2018, AAAI.
[37] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[38] Norman E. Fenton,et al. The Book of Why: The New Science of Cause and Effect, Judea Pearl, Dana Mackenzie. Basic Books (2018) , 2020, Artif. Intell..
[39] Chris Donahue,et al. Enabling Language Models to Fill in the Blanks , 2020, ACL.
[40] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[41] Lili Mou,et al. Disentangled Representation Learning for Non-Parallel Text Style Transfer , 2018, ACL.
[42] Frank Keller,et al. Image Description using Visual Dependency Representations , 2013, EMNLP.
[43] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[44] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[45] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Parisa Kordjamshidi,et al. Cross-Modality Relevance for Reasoning on Language and Vision , 2020, ACL.
[47] Jason Baldridge,et al. General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping , 2019, ViGIL@NeurIPS.
[48] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.
[49] Eneko Agirre,et al. Unsupervised Neural Machine Translation , 2017, ICLR.
[50] Yongdong Zhang,et al. Multi-Modality Cross Attention Network for Image and Sentence Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Licheng Yu,et al. Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout , 2019, NAACL.
[52] Dongyan Zhao,et al. Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.
[53] Ghassan Al-Regib,et al. Self-Monitoring Navigation Agent via Auxiliary Progress Estimation , 2019, ICLR.
[54] Yuan-Fang Wang,et al. Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Siddhartha S. Srinivasa,et al. Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Toby P. Breckon,et al. Style Augmentation: Data Augmentation via Style Randomization , 2018, CVPR Workshops.
[57] Nan Duan,et al. UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation , 2020, ArXiv.
[58] Jianlong Fu,et al. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers , 2020, ArXiv.
[59] Regina Barzilay,et al. Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.
[60] Eamonn J. Keogh,et al. Extracting Optimal Performance from Dynamic Time Warping , 2016, KDD.
[61] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[62] Guillaume Lample,et al. Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.
[63] Enhong Chen,et al. Style Transfer as Unsupervised Machine Translation , 2018, ArXiv.
[64] W. Hager,et al. and s , 2019, Shallow Water Hydraulics.
[65] Jianfeng Gao,et al. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[67] Eric P. Xing,et al. Toward Controlled Generation of Text , 2017, ICML.
[68] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.
[69] Jason Baldridge,et al. Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View , 2020, ArXiv.
[70] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[71] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.