Action Modifiers: Learning From Adverbs in Instructional Videos
暂无分享,去创建一个
[1] Ivan Laptev,et al. Joint Discovery of Object States and Manipulation Actions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[2] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Ramakant Nevatia,et al. Motion-Appearance Co-memory Networks for Video Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Chenliang Xu,et al. Towards Automatic Learning of Procedures From Web Instructional Videos , 2017, AAAI.
[5] Juan Carlos Niebles,et al. Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[6] Jean Ponce,et al. Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[7] Dima Damen,et al. Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] Wei Chen,et al. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework , 2015, AAAI.
[9] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[10] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.
[11] Silvio Savarese,et al. Recognizing human actions by attributes , 2011, CVPR 2011.
[12] Qiang Ji,et al. A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects , 2013, 2013 IEEE International Conference on Computer Vision.
[13] Yejin Choi,et al. Zero-Shot Activity Recognition with Verb Attribute Induction , 2017, EMNLP.
[14] Amit K. Roy-Chowdhury,et al. W-TALC: Weakly-supervised Temporal Activity Localization and Classification , 2018, ECCV.
[15] Andrew Zisserman,et al. Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.
[16] Rainer Stiefelhagen,et al. Book2Movie: Aligning video scenes with book chapters , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Ivan Laptev,et al. Cross-Task Weakly Supervised Learning From Instructional Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Amit K. Roy-Chowdhury,et al. Weakly Supervised Video Moment Retrieval From Text Queries , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Xiao Liu,et al. Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] Chunheng Wang,et al. Weakly Supervised Soft-detection-based Aggregation Method for Image Retrieval , 2018, ArXiv.
[21] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[22] Martial Hebert,et al. From Red Wine to Red Tomato: Composition with Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Yang Wang,et al. A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.
[24] Tao Mei,et al. Video Captioning with Transferred Semantic Attributes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Rongrong Ji,et al. Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.
[26] Kate Saenko,et al. LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval , 2019, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[27] Cewu Lu,et al. Human Action Adverb Recognition: ADHA Dataset and a Three-Stream Hybrid Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[28] Dima Damen,et al. Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Juergen Gall,et al. Action Sets: Weakly Supervised Action Segmentation Without Ordering Constraints , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] Amy Beth Warriner,et al. Concreteness ratings for 40 thousand generally known English word lemmas , 2014, Behavior research methods.
[31] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[32] Yoichi Sato,et al. Manipulation-Skill Assessment from Videos with Spatial Attention Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[33] Nanning Zheng,et al. Recognizing Unseen Attribute-Object Pair with Generative Model , 2019, AAAI.
[34] Amir Rosenfeld,et al. Action Classification via Concepts and Attributes , 2016, 2018 24th International Conference on Pattern Recognition (ICPR).
[35] Chunheng Wang,et al. Adversarial Soft-detection-based Aggregation Network for Image Retrieval , 2018, 1811.07619.
[36] Fadime Sener,et al. Zero-Shot Anticipation for Instructional Activities , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[37] Dima Damen,et al. The Pros and Cons: Rank-Aware Temporal Attention for Skill Determination in Long Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Ivan Laptev,et al. Learnable pooling with Context Gating for video classification , 2017, ArXiv.
[39] Yi Yang,et al. Uncovering the Temporal Context for Video Question Answering , 2017, International Journal of Computer Vision.
[40] Kristen Grauman,et al. Inferring Analogous Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[41] Tanel Alumäe,et al. Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration , 2016, INTERSPEECH.
[42] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Jongwook Choi,et al. End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Silvio Savarese,et al. Unsupervised Semantic Parsing of Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[45] Juan Carlos Niebles,et al. Title Generation for User Generated Videos , 2016, ECCV.
[46] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[47] Daochang Liu,et al. Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Cees Snoek,et al. Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[49] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[50] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[51] James M. Rehg,et al. Modeling Actions through State Changes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[52] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[53] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[54] Kevin Murphy,et al. What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision , 2015, NAACL.
[55] Juan Carlos Niebles,et al. Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[57] Juan Carlos Niebles,et al. Connectionist Temporal Modeling for Weakly Supervised Action Labeling , 2016, ECCV.
[58] Cees Snoek,et al. Objects2action: Classifying and Localizing Actions without Any Video Example , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[59] Ivan Laptev,et al. Unsupervised Learning from Narrated Instruction Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Edward H. Adelson,et al. Discovering states and transformations in image collections , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Kristen Grauman,et al. Attributes as Operators , 2018, ECCV.