Video Captioning with Transferred Semantic Attributes
暂无分享,去创建一个
Tao Mei | Houqiang Li | Yingwei Pan | Ting Yao | Tao Mei | Ting Yao | Yingwei Pan | Houqiang Li
[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[3] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[4] Jorma Laaksonen,et al. Video captioning with recurrent networks based on frame- and video-level features and visual content classification , 2015, ArXiv.
[5] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Yi Yang,et al. Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Bernt Schiele,et al. Translating Video Content to Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[8] Wei Chen,et al. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework , 2015, AAAI.
[9] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[10] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[11] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[12] Tao Mei,et al. Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure , 2016, IJCAI.
[13] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[14] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[16] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[17] Kunio Fukunaga,et al. Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions , 2002, International Journal of Computer Vision.
[18] Yi Yang,et al. You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[21] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[22] Jian Sun,et al. Rich Image Captioning in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[23] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[24] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[25] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[26] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[27] Chunhua Shen,et al. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.
[30] Bernt Schiele,et al. The Long-Short Story of Movie Description , 2015, GCPR.
[31] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[32] Paul A. Viola,et al. Multiple Instance Boosting for Object Detection , 2005, NIPS.
[33] Christopher Joseph Pal,et al. Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.
[34] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[35] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Christopher Joseph Pal,et al. Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research , 2015, ArXiv.
[41] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.