Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction
暂无分享,去创建一个
[1] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Yong Jae Lee,et al. Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Asim Kadav,et al. Attend and Interact: Higher-Order Object Interactions for Video Understanding , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[7] Seong Joon Oh,et al. Generating Descriptions with Grounded and Co-referenced People , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Thomas Deselaers,et al. Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.
[9] Anthony G. Cohn,et al. Natural Language Acquisition and Grounding for Embodied Robotic Systems , 2017, AAAI.
[10] Ali Farhadi,et al. Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[11] Luowei Zhou,et al. End-to-End Dense Video Captioning with Masked Transformer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Bingbing Ni,et al. Progressively Parsing Interactional Objects for Fine Grained Action Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[14] Zaïd Harchaoui,et al. On learning to localize objects with minimal supervision , 2014, ICML.
[15] Jean Ponce,et al. Unsupervised Object Discovery and Tracking in Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[16] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[17] Jeffrey Mark Siskind,et al. Sentence Directed Video Object Codiscovery , 2017, International Journal of Computer Vision.
[18] Robinson Piramuthu,et al. Conditional Image-Text Embedding Networks , 2017, ECCV.
[19] Deva Ramanan,et al. Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.
[20] Licheng Yu,et al. MAttNet: Modular Attention Network for Referring Expression Comprehension , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Chenliang Xu,et al. Towards Automatic Learning of Procedures From Web Instructional Videos , 2017, AAAI.
[22] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Gregory D. Hager,et al. Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation , 2016, ECCV.
[24] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[25] Louis-Philippe Morency,et al. Using Syntax to Ground Referring Expressions in Natural Images , 2018, AAAI.
[26] Peter Stone,et al. Opportunistic Active Learning for Grounding Natural Language Descriptions , 2017, CoRL.
[27] Cordelia Schmid,et al. Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Juan Carlos Niebles,et al. Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Cordelia Schmid,et al. Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.