暂无分享,去创建一个
Nanyun Peng | Ralph Weischedel | Te-Lin Wu | Marjorie Freedman | Alex Spangher | Pegah Alipoormolabashi | R. Weischedel | Nanyun Peng | Alexander Spangher | Marjorie Freedman | Pegah Alipoormolabashi | Te-Lin Wu
[1] S. Tomkins. The Tomkins-Horn picture-arrangement test. , 1952, Transactions of the New York Academy of Sciences.
[2] Mirella Lapata,et al. Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.
[3] Yueting Zhuang,et al. Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Dhruv Batra,et al. Sort Story: Sorting Jumbled Images and Captions into Stories , 2016, EMNLP.
[5] Honglak Lee,et al. Sentence Ordering and Coherence Modeling using Recurrent Neural Networks , 2016, AAAI.
[6] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Chris Callison-Burch,et al. Intent Detection with WikiHow , 2020, AACL.
[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[11] Francis Ferraro,et al. Visual Storytelling , 2016, NAACL.
[12] Bhavana Dalvi,et al. A Dataset for Tracking Entities in Open Domain Procedural Text , 2020, EMNLP.
[13] Yingming Li,et al. BERT-enhanced Relational Sentence Ordering Network , 2020, EMNLP.
[14] Chris Callison-Burch,et al. Reasoning about Goals, Steps, and Temporal Ordering with WikiHow , 2020, EMNLP.
[15] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[16] Zhe Gan,et al. HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training , 2020, EMNLP.
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Samy Bengio,et al. Order Matters: Sequence to sequence for sets , 2015, ICLR.
[19] Xuanjing Huang,et al. Neural Sentence Ordering , 2016, ArXiv.
[20] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[21] Rémi Calizzano,et al. Ordering sentences and paragraphs with pre-trained encoder-decoder transformers and pointer ensembles , 2021, DocEng.
[22] Kurt Keutzer,et al. How Much Can CLIP Benefit Vision-and-Language Tasks? , 2021, ICLR.
[23] Seungmin Seo,et al. Topic-Guided Coherence Modeling for Sentence Ordering by Preserving Global and Local Information , 2019, EMNLP.
[24] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[25] Ming-Hsuan Yang,et al. Unsupervised Representation Learning by Sorting Sequences , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[27] Mauro Birattari,et al. Autonomous task sequencing in a robot swarm , 2018, Science Robotics.
[28] Zhongfei Zhang,et al. Deep Attentive Sentence Ordering Network , 2018, EMNLP.
[29] Jie Shao,et al. COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation , 2021, IEEE International Conference on Computer Vision.
[30] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[31] Xuanjing Huang,et al. End-to-End Neural Sentence Ordering Using Pointer Network , 2016, ArXiv.
[32] Andrew N. Meltzoff,et al. Children's Representation and Imitation of Events: How Goal Organization Influences 3-Year-Old Children's Memory for Action Sequences , 2017, Cogn. Sci..
[33] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[35] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] S. Baron-Cohen,et al. Mechanical, behavioural and Intentional understanding of picture stories in autistic children , 1986 .
[37] Haejun Lee,et al. SLM: Learning a Discourse Language Representation with Sentence Unshuffling , 2020, EMNLP.
[38] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[39] Chris Callison-Burch,et al. Visual Goal-Step Inference using wikiHow , 2021, EMNLP.
[40] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[42] William Yang Wang,et al. WikiHow: A Large Scale Text Summarization Dataset , 2018, ArXiv.
[43] Nazli Ikizler-Cinbis,et al. RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes , 2018, EMNLP.
[44] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[45] Furu Wei,et al. BEiT: BERT Pre-Training of Image Transformers , 2021, ArXiv.
[46] Yu Guan,et al. Order Matters: Shuffling Sequence Generation for Video Prediction , 2019, 1907.08845.
[47] Jianlong Fu,et al. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers , 2020, ArXiv.
[48] Steven Schockaert,et al. Learning Household Task Knowledge from WikiHow Descriptions , 2019, SemDeep@IJCAI.