Embodied Language Grounding With 3D Visual Feature Representations
暂无分享,去创建一个
Mihir Prabhudesai | Katerina Fragkiadaki | Adam W. Harley | Hsiao-Yu Fish Tung | Syed Ashar Javed | Maximilian Sieb
[1] Bruno A. Olshausen,et al. Perception as an Inference Problem , 2013 .
[2] Pushmeet Kohli,et al. Vision-as-Inverse-Graphics: Obtaining a Rich 3D Explanation of a Scene from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[3] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Jason Weston,et al. Memory Networks , 2014, ICLR.
[5] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Geoffrey Zweig,et al. Language Models for Image Captioning: The Quirks and What Works , 2015, ACL.
[7] Michael P. Kaschak,et al. Grounding language in action , 2002, Psychonomic bulletin & review.
[8] Emmanuel Dupoux,et al. IntPhys 2019: A Benchmark for Visual Intuitive Physics Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] J. Gibson. The Ecological Approach to Visual Perception , 1979 .
[10] Ruslan Salakhutdinov,et al. Gated-Attention Readers for Text Comprehension , 2016, ACL.
[11] Bernt Schiele,et al. Translating Video Content to Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[12] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Jerome A. Feldman,et al. From Molecule to Metaphor - A Neural Theory of Language , 2006 .
[14] A. Glenberg,et al. Symbol Grounding and Meaning: A Comparison of High-Dimensional and Embodied Theories of Meaning , 2000 .
[15] Yuval Tassa,et al. Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[16] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Trevor Darrell,et al. Modeling Relationships in Referential Expressions with Compositional Modular Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Greg Mori,et al. Probabilistic Neural Programmed Networks for Scene Generation , 2018, NeurIPS.
[19] Erwin Coumans,et al. Bullet physics simulation , 2015, SIGGRAPH Courses.
[20] Karen Emmorey,et al. Modulation of BOLD Response in Motion-sensitive Lateral Temporal Cortex by Real and Fictive Motion Sentences , 2010, Journal of Cognitive Neuroscience.
[21] B. Bergen. Experimental methods for simulation semantics , 2007 .
[22] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Katerina Fragkiadaki,et al. Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[24] Seong Joon Oh,et al. Generating Descriptions with Grounded and Co-referenced People , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] C. Lawrence Zitnick,et al. Learning Common Sense through Visual Abstraction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[26] J. Feldman,et al. Embodied meaning in a neural theory of language , 2004, Brain and Language.
[27] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Louis-Philippe Morency,et al. Using Syntax to Ground Referring Expressions in Natural Images , 2018, AAAI.
[29] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[30] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[31] Rudolf Kadlec,et al. Text Understanding with the Attention Sum Reader Network , 2016, ACL.
[32] Katerina Fragkiadaki,et al. Learning Spatial Common Sense With Geometry-Aware Recurrent Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[34] Slav Petrov,et al. Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.
[35] B. Bergen. Mental Simulation in Spatial Language Processing , 2005 .
[36] Katerina Fragkiadaki,et al. Material for “ Adversarial Inverse Graphics Networks : Learning 2 Dto-3 D Lifting and Image-to-Image Translation from Unpaired Supervision ” , 2017 .
[37] Sergey Levine,et al. Learning Dexterous Manipulation Policies from Experience and Imitation , 2016, ArXiv.
[38] Chuang Gan,et al. The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.
[39] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[40] Joshua B. Tenenbaum,et al. Deep Convolutional Inverse Graphics Network , 2015, NIPS.
[41] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[42] Jonathan Berant,et al. Semantic Parsing via Paraphrasing , 2014, ACL.
[43] B. Bergen. Embodiment, simulation and meaning , 2015 .
[44] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] R. Ivry,et al. Please Scroll down for Article Social Neuroscience Modulation of the Ffa and Ppa by Language Related to Faces and Places Modulation of the Ffa and Ppa by Language Related to Faces and Places , 2022 .
[46] Daniel Casasanto,et al. Neural Dissociations between Action Verb Understanding and Motor Imagery , 2010, Journal of Cognitive Neuroscience.
[47] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[48] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[49] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.