Weakly Supervised Action Learning with RNN Based Fine-to-Coarse Modeling
暂无分享,去创建一个
Juergen Gall | Alexander Richard | Hildegard Kuehne | Juergen Gall | Alexander Richard | Hilde Kuehne
[1] Kevin Murphy,et al. What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision , 2015, NAACL.
[2] Cordelia Schmid,et al. Weakly Supervised Action Labeling in Videos under Ordering Constraints , 2014, ECCV.
[3] Fei-Fei Li,et al. Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[4] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[5] Juan Carlos Niebles,et al. Connectionist Temporal Modeling for Weakly Supervised Action Labeling , 2016, ECCV.
[6] Thomas Serre,et al. An end-to-end generative framework for video segmentation and recognition , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).
[7] Chen Sun,et al. Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames , 2016, ECCV.
[8] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Hermann Ney,et al. Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Ramakant Nevatia,et al. Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images , 2015, ACM Multimedia.
[12] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.
[13] Thomas Serre,et al. The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[14] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .
[15] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.
[16] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[17] Cordelia Schmid,et al. Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[18] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[19] Gregory D. Hager,et al. Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation , 2016, ECCV.
[20] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.
[21] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[22] Silvio Savarese,et al. Watch-n-patch: Unsupervised understanding of actions and relations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Jean Ponce,et al. Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[24] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.
[25] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[26] Ivan Laptev,et al. Unsupervised Learning from Narrated Instruction Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[29] Andreas Stolcke,et al. Using a stochastic context-free grammar as a language model for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[30] Christopher Joseph Pal,et al. Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.
[31] Ming Shao,et al. A Multi-stream Bi-directional Recurrent Neural Network for Fine-Grained Action Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Juergen Gall,et al. Temporal Action Detection Using a Statistical Language Model , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Li Fei-Fei,et al. End-to-End Learning of Action Detection from Frame Glimpses in Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[35] Xi Wang,et al. Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.
[36] Juergen Gall,et al. Weakly supervised learning of actions from transcripts , 2016, Comput. Vis. Image Underst..