Long-term recurrent convolutional networks for visual recognition and description
暂无分享,去创建一个
Trevor Darrell | Marcus Rohrbach | Kate Saenko | Jeff Donahue | Subhashini Venugopalan | Sergio Guadarrama | Lisa Anne Hendricks | Trevor Darrell | Jeff Donahue | S. Guadarrama | Subhashini Venugopalan | Marcus Rohrbach | Kate Saenko
[1] Miss A.O. Penney. (b) , 1974, The New Yale Book of Quotations.
[2] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[3] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[5] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[6] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[7] Thomas Brox,et al. High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.
[8] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[9] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[12] Christian Wolf,et al. Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks , 2010, ICANN.
[13] Christian Wolf,et al. Sequential Deep Learning for Human Action Recognition , 2011, HBU.
[14] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[15] Chong-Wah Ngo,et al. Towards textually describing complex video contents with audio-visual concept classifiers , 2011, ACM Multimedia.
[16] Yiannis Aloimonos,et al. Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.
[17] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[18] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[19] Lei Zhang,et al. Human Focused Video Description , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).
[20] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[21] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[22] Sven J. Dickinson,et al. Video In Sentences Out , 2012, UAI.
[23] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[24] Cordelia Schmid,et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.
[25] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[26] Daniel Povey,et al. Revisiting Recurrent Neural Networks for robust ASR , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[28] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[29] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[30] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[31] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[33] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[34] Bernt Schiele,et al. Translating Video Content to Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[35] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[36] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.
[37] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.
[38] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[39] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[40] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[41] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[42] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[43] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[44] Yejin Choi,et al. TreeTalk: Composition and Compression of Trees for Image Descriptions , 2014, TACL.
[45] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[46] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[47] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[48] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[49] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[50] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[51] Bernt Schiele,et al. Coherent Multi-sentence Video Description with Variable Level of Detail , 2014, GCPR.
[52] Georg Heigold,et al. Sequence discriminative distributed training of long short-term memory recurrent neural networks , 2014, INTERSPEECH.
[53] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[54] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[55] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[56] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Saurabh Gupta,et al. Exploring Nearest Neighbor Approaches for Image Captioning , 2015, ArXiv.
[59] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[60] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[62] Lisa Anne Hendricks,et al. Long-term recurrent convolutional networks for visual recognition and description , 2015, CVPR.
[63] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[64] Marcus Rohrbach,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[65] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[68] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[69] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[70] Xu Wei,et al. Learning Like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[71] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[72] Geoffrey Zweig,et al. Language Models for Image Captioning: The Quirks and What Works , 2015, ACL.
[73] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[74] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Trevor Darrell,et al. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[76] Li Fei-Fei,et al. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos , 2015, International Journal of Computer Vision.