Oops! Predicting Unintentional Action in Video
暂无分享,去创建一个
Boyuan Chen | Carl Vondrick | Dave Epstein | Boyuan Chen | Boyuan Chen | Carl Vondrick | Dave Epstein
[1] Alexander Kolesnikov,et al. Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Andrew Zisserman,et al. Video Representation Learning by Dense Predictive Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[3] Guangchun Cheng,et al. Advances in Human Action Recognition: A Survey , 2015, ArXiv.
[4] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Yunde Jia,et al. Parsing video events with goal inference and intent prediction , 2011, 2011 International Conference on Computer Vision.
[6] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.
[7] Zachary C. Burns,et al. Slow motion increases perceived intent , 2016, Proceedings of the National Academy of Sciences.
[8] Wojciech Matusik,et al. Gaze360: Physically Unconstrained Gaze Estimation in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Antonio Torralba,et al. Where are they looking? , 2015, NIPS.
[10] Amanda C. Brandone,et al. You Can't Always Get What You Want , 2009, Psychological science.
[11] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[12] Quan Z. Sheng,et al. Online human gesture recognition from motion data streams , 2013, ACM Multimedia.
[13] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[15] Ming-Hsuan Yang,et al. Unsupervised Representation Learning by Sorting Sequences , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[16] Gregory Shakhnarovich,et al. Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Jitendra Malik,et al. View Synthesis by Appearance Flow , 2016, ECCV.
[18] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[19] Yang Wang,et al. Back to the Future: Knowledge Distillation for Human Action Anticipation , 2019, ArXiv.
[20] Qing Lei,et al. A Comprehensive Survey of Vision-Based Human Action Recognition Methods , 2019, Sensors.
[21] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[22] A. Woodward. Infants' ability to distinguish between purposeful and non-purposeful behaviors , 1999 .
[23] Jonathan Tompson,et al. Temporal Cycle-Consistency Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Wonjun Hwang,et al. Self-Supervised Spatio-Temporal Representation Learning Using Variable Playback Speed Prediction , 2020, ArXiv.
[25] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Abhinav Gupta,et al. Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[27] Michael E. Bratman,et al. Intention, Plans, and Practical Reason , 1991 .
[28] Andrew Zisserman,et al. Learning and Using the Arrow of Time , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Sergio Guadarrama,et al. Tracking Emerges by Colorizing Videos , 2018, ECCV.
[30] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[31] Rémi Ronfard,et al. A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..
[32] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[33] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[34] Bolei Zhou,et al. Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] Cordelia Schmid,et al. Learning Video Representations using Contrastive Bidirectional Transformer , 2019 .
[36] Xiaoou Tang,et al. Action Recognition and Detection by Combining Motion and Appearance Features , 2014 .
[37] Gang Yu,et al. Predicting human activities using spatio-temporal structure of interest points , 2012, ACM Multimedia.
[38] Michael S. Ryoo,et al. Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.
[39] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.
[40] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[41] Xueting Li,et al. Joint-task Self-supervised Learning for Temporal Correspondence , 2019, NeurIPS.
[42] A. Woodward. Infants' Grasp of Others' Intentions , 2009, Current directions in psychological science.
[43] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[44] Deva Ramanan,et al. Parsing Videos of Actions with Segmental Grammars , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[45] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[46] Jiajun Wu,et al. Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.
[47] Ersin Yumer,et al. Self-supervised Learning of Motion Capture , 2017, NIPS.
[48] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[49] Charless C. Fowlkes,et al. The Open World of Micro-Videos , 2016, ArXiv.
[50] Allan Jabri,et al. Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[52] Kristen Grauman,et al. Learning Image Representations Tied to Ego-Motion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[53] Yueting Zhuang,et al. Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[55] Intentions L. Woodward. Infants' Grasp of Others' , 2009 .
[56] Thomas Brox,et al. Learning Representations for Predicting Future Activities , 2019, ArXiv.
[57] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.
[58] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Zihang Lai,et al. Self-supervised Learning for Video Correspondence Flow , 2019, ArXiv.
[60] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.
[62] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.
[63] Fernando De la Torre,et al. Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[64] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[65] Jitendra Malik,et al. From Lifestyle Vlogs to Everyday Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[66] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[67] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Sergio Escalera,et al. A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).
[69] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[70] J.K. Aggarwal,et al. Human activity analysis , 2011, ACM Comput. Surv..
[71] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[72] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.
[73] Jitendra Malik,et al. Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[74] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[75] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[76] Cordelia Schmid,et al. Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.
[77] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[78] Ronald Poppe,et al. A survey on vision-based human action recognition , 2010, Image Vis. Comput..
[79] Ronen Basri,et al. Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[80] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.
[81] Efstratios Gavves,et al. Self-Supervised Video Representation Learning with Odd-One-Out Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[82] Richard P. Wildes,et al. Review of Action Recognition and Detection Methods , 2016, ArXiv.
[83] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.
[84] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.