暂无分享,去创建一个
[1] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[2] Fergal Cotter,et al. Probabilistic Future Prediction for Video Scene Understanding , 2020, ECCV.
[3] Juan Carlos Niebles,et al. Spatio-Temporal Graph for Video Captioning With Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[5] Pietro Perona,et al. Visual Causal Feature Learning , 2014, UAI.
[6] Li Fei-Fei,et al. Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.
[7] Joyce Yue Chai,et al. Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches , 2019, ArXiv.
[8] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Yejin Choi,et al. Commonsense Reasoning for Natural Language Processing , 2020, ACL.
[10] Wu Liu,et al. T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition , 2018, AAAI.
[11] Hanwang Zhang,et al. Two Causal Principles for Improving Visual Dialog , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[13] C. Lawrence Zitnick,et al. Learning Common Sense through Visual Abstraction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[14] Hanwang Zhang,et al. More Grounded Image Captioning by Distilling Image-Text Matching Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Vlado Menkovski,et al. Causal Discovery from Incomplete Data: A Deep Learning Approach , 2020, ArXiv.
[17] Juan Carlos Niebles,et al. Leveraging Video Descriptions to Learn Video Question Answering , 2016, AAAI.
[18] Wei Liu,et al. Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19] Luowei Zhou,et al. End-to-End Dense Video Captioning with Masked Transformer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] Mélanie Frappier,et al. The Book of Why: The New Science of Cause and Effect , 2018, Science.
[21] Aman Chadha,et al. iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering , 2020, ArXiv.
[22] Trevor Darrell,et al. Women also Snowboard: Overcoming Bias in Captioning Models , 2018, ECCV.
[23] Alexandros G. Dimakis,et al. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training , 2017, ICLR.
[24] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[25] J. Pearl,et al. Causal Inference in Statistics: A Primer , 2016 .
[26] In So Kweon,et al. Video Panoptic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Larry S. Davis,et al. Explicit Bias Discovery in Visual Question Answering Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[30] Jianqiang Huang,et al. Unbiased Scene Graph Generation From Biased Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] J. Pearl. Interpretation and Identification of Causal Mediation , 2013, Psychological methods.
[32] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[33] Hongming Zhang,et al. Learning Contextual Causality from Time-consecutive Images , 2020, ArXiv.
[34] Trevor Darrell,et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Ali Farhadi,et al. Stating the Obvious: Extracting Visual Common Sense Knowledge , 2016, NAACL.
[36] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[37] Xiao Lin,et al. Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Silvio Savarese,et al. Causal Induction from Visual Observations for Goal Directed Tasks , 2019, ArXiv.
[39] Bernhard Scholkopf. Causality for Machine Learning , 2019 .
[40] Yutaka Kidawara,et al. Toward Future Scenario Generation: Extracting Event Causality Exploiting Semantic Relation, Context, and Association Features , 2014, ACL.
[41] Chitta Baral,et al. From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge , 2015, ArXiv.
[42] Hao Wu,et al. Joint Reasoning for Temporal and Causal Relations , 2018, ACL.
[43] Hanwang Zhang,et al. Visual Commonsense R-CNN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[45] Richard Socher,et al. Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.
[46] Bernhard Schölkopf,et al. Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Ben Goertzel,et al. Artificial General Intelligence: Concept, State of the Art, and Future Prospects , 2009, J. Artif. Gen. Intell..
[48] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[49] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[50] Chitta Baral,et al. Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning , 2020, EMNLP.
[51] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[52] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[53] Jong-Hoon Oh,et al. Why-Question Answering using Intra- and Inter-Sentential Causal Relations , 2013, ACL.
[54] Xinlei Chen,et al. Towards VQA Models That Can Read , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).