The “Something Something” Video Database for Learning and Evaluating Visual Common Sense
暂无分享,去创建一个
Susanne Westphal | Ingo Bax | Roland Memisevic | Samira Ebrahimi Kahou | Vincent Michalski | Christian Thurau | Heuna Kim | Raghav Goyal | Valentin Haenel | Joanna Materzynska | Ingo Fründ | Peter Yianilos | Moritz Mueller-Freitag | Florian Hoppe | S. Kahou | R. Memisevic | Vincent Michalski | P. Yianilos | V. Haenel | Christian Thurau | Ingo Fründ | Joanna Materzynska | R. Goyal | S. Westphal | Heuna Kim | Moritz Mueller-Freitag | F. Hoppe | I. Bax | Raghav Goyal | Joanna Materzynska | Ingo Bax
[1] G. Lakoff,et al. Metaphors We Live by , 1981 .
[2] G. Lakoff,et al. Metaphors We Live by , 1982 .
[3] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .
[4] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[5] Mubarak Shah,et al. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[6] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[7] Cordelia Schmid,et al. Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[8] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[9] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.
[10] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[11] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[12] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[13] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[14] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..
[15] Roland Memisevic,et al. Learning to Relate Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Marc'Aurelio Ranzato,et al. Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.
[17] Roland Memisevic,et al. Modeling Deep Temporal Dependencies with Recurrent "Grammar Cells" , 2014, NIPS.
[18] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[19] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[20] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[21] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Antonio Torralba,et al. Anticipating the future by watching unlabeled video , 2015, ArXiv.
[23] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Christopher Joseph Pal,et al. Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research , 2015, ArXiv.
[25] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[26] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[28] Jiajun Wu,et al. Physics 101: Learning Physical Object Properties from Unlabeled Videos , 2016, BMVC.
[29] John R. Smith,et al. Oracle Performance for Visual Captioning , 2016, BMVC.
[30] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Jitendra Malik,et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.
[33] Arnold W. M. Smeulders,et al. Generating captions without looking beyond objects , 2016, ArXiv.
[34] Ali Farhadi,et al. Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Rob Fergus,et al. Learning Physical Intuition of Block Towers by Example , 2016, ICML.
[37] Abhinav Gupta,et al. The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.
[38] Yu-Gang Jiang,et al. Harnessing Object and Scene Semantics for Large-Scale Video Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Thomas Brox,et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.
[40] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[41] Shih-Fu Chang,et al. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.