Towards Automatic Learning of Procedures From Web Instructional Videos
暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[2] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[3] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[4] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[5] Shih-Fu Chang,et al. Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Cordelia Schmid,et al. Temporal Localization of Actions with Actoms. , 2013, IEEE transactions on pattern analysis and machine intelligence.
[9] Kevin Murphy,et al. What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision , 2015, NAACL.
[10] Juan Carlos Niebles,et al. Connectionist Temporal Modeling for Weakly Supervised Action Labeling , 2016, ECCV.
[11] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[12] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Chenliang Xu,et al. Watch What You Just Said: Image Captioning with Text-Conditional Attention , 2016, ACM Multimedia.
[14] Juergen Gall,et al. Weakly supervised learning of actions from transcripts , 2016, Comput. Vis. Image Underst..
[15] Stephen J. McKenna,et al. Combining embedded accelerometers with computer vision for recognizing food preparation activities , 2013, UbiComp.
[16] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[17] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[18] Thomas Serre,et al. The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[19] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[20] Ivan Laptev,et al. Unsupervised Learning from Narrated Instruction Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Anthony G. Cohn,et al. Natural Language Acquisition and Grounding for Embodied Robotic Systems , 2017, AAAI.
[22] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[25] Silvio Savarese,et al. Unsupervised Semantic Parsing of Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[26] Bernard Ghanem,et al. DAPs: Deep Action Proposals for Action Understanding , 2016, ECCV.
[27] Cordelia Schmid,et al. Weakly-Supervised Alignment of Video with Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[28] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[29] Ke Zhang,et al. Video Summarization with Long Short-Term Memory , 2016, ECCV.
[30] Chenliang Xu,et al. Image Caption Generation with Text-Conditional Semantic Attention , 2016, ArXiv.
[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Thomas Serre,et al. An end-to-end generative framework for video segmentation and recognition , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).
[33] Cordelia Schmid,et al. Weakly Supervised Action Labeling in Videos under Ordering Constraints , 2014, ECCV.
[34] Jeffrey Mark Siskind,et al. Learning to Describe Video with Weak Supervision by Exploiting Negative Sentential Information , 2015, AAAI.
[35] Jonathan W. Pillow,et al. Discovering Event Structure in Continuous Narrative Perception and Memory , 2016, Neuron.
[36] Li Fei-Fei,et al. End-to-End Learning of Action Detection from Frame Glimpses in Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Ming Shao,et al. A Multi-stream Bi-directional Recurrent Neural Network for Fine-Grained Action Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Wei Chen,et al. Actionness Ranking with Lattice Conditional Ordinal Random Fields , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[39] Cordelia Schmid,et al. Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.
[40] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.