PACS: A Dataset for Physical Audiovisual CommonSense Reasoning
暂无分享,去创建一个
[1] Cewu Lu,et al. HAKE: A Knowledge Engine Foundation for Human Activity Understanding , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Federico Raue,et al. Audioclip: Extending Clip to Image, Text and Audio , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Jianfeng Gao,et al. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing , 2021, ArXiv.
[4] Li Fei-Fei,et al. ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations , 2021, CoRL.
[5] J. Chai,et al. Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding , 2021, EMNLP.
[6] Ali Farhadi,et al. MERLOT: Multimodal Neural Script Knowledge Models , 2021, NeurIPS.
[7] Mausam,et al. TANGO: Commonsense Generalization in Predicting Tool Interactions for Mobile Manipulators , 2021, IJCAI.
[8] Andreas Dengel,et al. ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).
[9] Lihi Zelnik-Manor,et al. ImageNet-21K Pretraining for the Masses , 2021, NeurIPS Datasets and Benchmarks.
[10] James R. Glass,et al. AST: Audio Spectrogram Transformer , 2021, Interspeech.
[11] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[12] Limin Wang,et al. TDN: Temporal Difference Networks for Efficient Action Recognition , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[14] Yagya Raj Pandeya,et al. Deep learning-based late fusion of multimodal information for emotion classification of music video , 2020, Multimedia Tools and Applications.
[15] Jianfeng Gao,et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.
[16] Zhenjie Zhao,et al. Learning Physical Common Sense as Knowledge Graph Completion via BERT Data Augmentation and Constrained Tucker Factorization , 2020, EMNLP.
[17] Tom'avs Souvcek,et al. TransNet V2: An effective deep network architecture for fast shot transition detection , 2020, ArXiv.
[18] Louis-Philippe Morency,et al. What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets , 2020, ArXiv.
[19] Yejin Choi,et al. VisualCOMET: Reasoning About the Dynamic Context of a Still Image , 2020, ECCV.
[20] K. Grauman,et al. SoundSpaces: Audio-Visual Navigation in 3D Environments , 2019, ECCV.
[21] Yejin Choi,et al. PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.
[22] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[23] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[24] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[25] Song-Chun Zhu,et al. Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Yejin Choi,et al. Do Neural Language Representations Learn Physical Commonsense? , 2019, CogSci.
[27] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Louis-Philippe Morency,et al. Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Matthieu Cord,et al. RUBi: Reducing Unimodal Biases in Visual Question Answering , 2019, NeurIPS.
[30] Sonia Chernova,et al. Tool Macgyvering: Tool Construction Using Geometric Reasoning , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[31] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Yoav Artzi,et al. A Corpus for Reasoning about Natural Language Grounded in Photographs , 2018, ACL.
[33] Hugo Larochelle,et al. Blindfold Baselines for Embodied QA , 2018, ArXiv.
[34] P. Corlett,et al. Conditioned hallucinations: historic insights and future directions , 2018, World psychiatry : official journal of the World Psychiatric Association.
[35] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.
[36] Marc Toussaint,et al. Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.
[37] David M. Mimno,et al. Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets , 2018, NAACL.
[38] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Jiajun Wu,et al. Generative Modeling of Audible Shapes for Object Perception , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Yoav Artzi,et al. A Corpus of Natural Language for Visual Reasoning , 2017, ACL.
[41] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[42] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Ali Farhadi,et al. Commonly Uncommon: Semantic Sparsity in Situation Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Tegan Maharaj,et al. A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Jiajun Wu,et al. Shape and Material from Sound , 2017, NIPS.
[46] Jiajun Wu,et al. Learning to See Physics via Visual De-animation , 2017, NIPS.
[47] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[48] Allan Jabri,et al. Revisiting Visual Question Answering Baselines , 2016, ECCV.
[49] Jitendra Malik,et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.
[50] Dhruv Batra,et al. Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.
[51] Ali Farhadi,et al. "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.
[52] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[53] Susan J. Hespos,et al. Five-Month-Old Infants Have General Knowledge of How Nonsolid Substances Behave and Interact , 2016, Psychological science.
[54] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Theodoros Giannakopoulos. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis , 2015, PloS one.
[56] Jiajun Wu,et al. Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.
[57] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[58] Kevin A. Smith,et al. Consistent physics underlying ballistic motion prediction , 2013, CogSci.
[59] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[60] J. Bliss. Commonsense reasoning about the physical world , 2008 .
[61] Teresa Wilcox,et al. Shake, Rattle, and … One or Two Objects? Young Infants' Use of Auditory Information to Individuate Objects. , 2006, Infancy : the official journal of the International Society on Infant Studies.
[62] Susan J. Hespos,et al. Conceptual precursors to language , 2004, Nature.
[63] Marvin Minsky,et al. Commonsense-based interfaces , 2000, CACM.
[64] E. Spelke,et al. Perception and understanding of effects of gravity and inertia on object motion , 1999 .
[65] B. Morrongiello,et al. Crossmodal learning in newborn infants: Inferences about properties of auditory-visual events , 1998 .
[66] S. Handel,et al. Chapter 12 – Timbre Perception and Auditory Object Identification , 1995 .
[67] Peter Szolovits,et al. What Is a Knowledge Representation? , 1993, AI Mag..
[68] J. Jonides,et al. Intuitive reasoning about abstract and familiar physics problems , 1986, Memory & cognition.
[69] Daniel G. Bobrow,et al. Qualitative Reasoning about Physical Systems: An Introduction , 1984, Artif. Intell..
[70] Kenneth D. Forbus. Qualitative Process Theory , 1984, Artif. Intell..