Translating Video Content to Natural Language Descriptions
暂无分享,去创建一个
Bernt Schiele | Marcus Rohrbach | Ivan Titov | Wei Qiu | Stefan Thater | Manfred Pinkal | B. Schiele | Marcus Rohrbach | Ivan Titov | Manfred Pinkal | Stefan Thater | Wei Qiu
[1] Larry S. Davis,et al. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, CVPR.
[2] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[3] Mauro Cettolo,et al. IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.
[4] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[5] Chong-Wah Ngo,et al. Towards textually describing complex video contents with audio-visual concept classifiers , 2011, ACM Multimedia.
[6] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[7] Bernt Schiele,et al. Script Data for Attribute-Based Recognition of Composite Activities , 2012, ECCV.
[8] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.
[9] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[10] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[11] Chris Callison-Burch,et al. Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .
[12] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[13] Klamer Schutte,et al. Automated Textual Descriptions for a Wide Range of Video Events with 48 Human Actions , 2012, ECCV Workshops.
[14] David A. Forsyth,et al. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.
[15] Cordelia Schmid,et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.
[16] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[17] Kunio Fukunaga,et al. Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions , 2002, International Journal of Computer Vision.
[18] Yansong Feng,et al. How Many Words Is a Picture Worth? Automatic Caption Generation for News Images , 2010, ACL.
[19] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[20] Lei Zhang,et al. Human Focused Video Description , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).
[21] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[22] Philip Koehn,et al. Statistical Machine Translation , 2010, EAMT.
[23] Sven J. Dickinson,et al. Video In Sentences Out , 2012, UAI.
[24] Ahmet Aker,et al. Generating Image Descriptions Using Dependency Relational Patterns , 2010, ACL.
[25] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[26] Fei-Fei Li,et al. Video Event Understanding Using Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.