Layout-Induced Video Representation for Recognizing Agent-in-Place Actions
暂无分享,去创建一个
[1] Takeo Kanade,et al. A System for Video Surveillance and Monitoring , 2000 .
[2] Cordelia Schmid,et al. Actor-Centric Relation Network , 2018, ECCV.
[3] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[4] Rahul Sukthankar,et al. Rethinking the Faster R-CNN Architecture for Temporal Action Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6] Svetlana Lazebnik,et al. Superparsing , 2010, International Journal of Computer Vision.
[7] Limin Wang,et al. Temporal Action Detection with Structured Segment Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[8] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[9] Fabio Tozeto Ramos,et al. Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).
[10] Martial Hebert,et al. Activity Forecasting , 2012, ECCV.
[11] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[12] Alexei A. Efros,et al. Scene completion using millions of photographs , 2008, Commun. ACM.
[13] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Larry S. Davis,et al. C-WSL: Count-guided Weakly Supervised Localization , 2017, ECCV.
[15] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[16] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Yu-Gang Jiang,et al. Learning Semantic Feature Map for Visual Content Recognition , 2017, ACM Multimedia.
[18] Yuan-Fang Wang,et al. Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection , 2018, ACCV.
[19] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[20] Song-Chun Zhu,et al. Learning Human-Object Interactions by Graph Parsing Neural Networks , 2018, ECCV.
[21] Antonio Torralba,et al. Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[22] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[23] Larry S. Davis,et al. The Role of Context Selection in Object Detection , 2016, BMVC.
[24] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[25] Vittorio Ferrari,et al. COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[26] Shaogang Gong,et al. Discovery of Shared Semantic Spaces for Multiscene Video Query and Summarization , 2015, IEEE Transactions on Circuits and Systems for Video Technology.
[27] Alexei A. Efros,et al. Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.
[28] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[29] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .
[31] Larry S. Davis,et al. Selective Encoding for Recognizing Unreliably Localized Faces , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[32] Shih-Fu Chang,et al. CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Hao Su,et al. Object Bank: An Object-Level Image Representation for High-Level Visual Recognition , 2014, International Journal of Computer Vision.
[35] Xu Zhao,et al. Single Shot Temporal Action Detection , 2017, ACM Multimedia.
[36] Catherine Achard,et al. The DAily Home LIfe Activity Dataset: A High Semantic Activity Dataset for Online Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).
[37] Kate Saenko,et al. R-C3D: Region Convolutional 3D Network for Temporal Activity Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[39] D. Burns,et al. End To End , 2015 .
[40] Jonghyun Choi,et al. Learning Temporal Regularity in Video Sequences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Larry S. Davis,et al. Generating Holistic 3D Scene Abstractions for Text-Based Image Retrieval , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Trevor Darrell,et al. The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[43] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[44] Chenliang Xu,et al. Actor-Action Semantic Segmentation with Grouping Process Models , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Wei Shen,et al. Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes , 2016, Signal Process. Image Commun..
[46] Larry S. Davis,et al. W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..
[47] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[48] Bernard Ghanem,et al. End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos , 2017, BMVC.
[49] Ghassan Al-Regib,et al. TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition , 2017, Signal Process. Image Commun..
[50] Nicu Sebe,et al. Learning Deep Representations of Appearance and Motion for Anomalous Event Detection , 2015, BMVC.
[51] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[52] Yann LeCun,et al. Convolutional Learning of Spatio-temporal Features , 2010, ECCV.
[53] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[54] Larry S. Davis,et al. ReMotENet: Efficient Relevant Motion Event Detection for Large-Scale Home Surveillance Videos , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[55] Antonio Torralba,et al. A Data-Driven Approach for Event Prediction , 2010, ECCV.
[56] Haroon Idrees,et al. The THUMOS challenge on action recognition for videos "in the wild" , 2016, Comput. Vis. Image Underst..
[57] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[59] Asim Kadav,et al. Attend and Interact: Higher-Order Object Interactions for Video Understanding , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[60] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[61] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[62] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[63] Koen E. A. van de Sande,et al. Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.
[64] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[65] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.
[66] Larry S. Davis,et al. Dynamic Zoom-in Network for Fast Object Detection in Large Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[67] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Larry S. Davis,et al. Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[69] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[70] Bernt Schiele,et al. Recognizing Fine-Grained and Composite Activities Using Hand-Centric Features and Script Data , 2015, International Journal of Computer Vision.
[71] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[72] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[73] François Brémond,et al. Video understanding for complex activity recognition , 2006, Machine Vision and Applications.
[74] Alberto Del Bimbo,et al. A data-driven approach for tag refinement and localization in web videos , 2015, Comput. Vis. Image Underst..
[75] Silvio Savarese,et al. Knowledge Transfer for Scene-Specific Motion Prediction , 2016, ECCV.