Deep visual-semantic alignments for generating image descriptions
暂无分享,去创建一个
[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[3] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[5] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[6] Yejin Choi,et al. TreeTalk: Composition and Compression of Trees for Image Descriptions , 2014, TACL.
[7] Sanja Fidler,et al. What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[8] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[10] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[11] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[12] Yiannis Aloimonos,et al. Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.
[13] Frank Keller,et al. Image Description using Visual Dependency Representations , 2013, EMNLP.
[14] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[15] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[16] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[17] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[18] Hannu Toivonen,et al. TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..
[19] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[20] Rosine Cicchetti,et al. FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.
[21] Luke S. Zettlemoyer,et al. See No Evil, Say No Evil: Description Generation from Densely Labeled Images , 2014, *SEMEVAL.
[22] Saurabh Gupta,et al. Exploring Nearest Neighbor Approaches for Image Captioning , 2015, ArXiv.
[23] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[24] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[25] Lucy Vanderwende,et al. Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.
[26] Peter A. Flach,et al. Database Dependency Discovery: A Machine Learning Approach , 1999, AI Commun..
[27] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.
[28] P. Perona,et al. What do we perceive in a glance of a real-world scene? , 2007, Journal of vision.
[29] Weixiong Zhang. State-space search - algorithms, complexity, extensions, and applications , 1999 .
[30] Yejin Choi,et al. Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.
[31] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[32] Luke S. Zettlemoyer,et al. A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.
[33] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[34] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[35] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[36] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[37] Xinlei Chen,et al. Learning a Recurrent Visual Representation for Image Caption Generation , 2014, ArXiv.
[38] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[39] Sanja Fidler,et al. A Sentence Is Worth a Thousand Pixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[40] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.
[41] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[42] Sanja Fidler,et al. Visual Semantic Search: Retrieving Videos via Complex Textual Queries , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[43] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..
[44] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[45] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[46] Liang Lin,et al. I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.
[47] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[48] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..
[49] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[50] Sven J. Dickinson,et al. Video In Sentences Out , 2012, UAI.
[51] Fei-Fei Li,et al. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[52] Yoshua Bengio,et al. Neural Probabilistic Language Models , 2006 .
[53] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[54] Fei-Fei Li,et al. What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[55] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Cory J. Butz,et al. FD/spl I.bar/Mine: discovering functional dependencies in a database using equivalences , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[57] Stephen Gould,et al. Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[58] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[59] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[60] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[61] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[62] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[63] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[64] Ankush Gupta,et al. From Image Annotation to Image Description , 2012, ICONIP.
[65] Trevor Darrell,et al. Learning cross-modality similarity for multinomial data , 2011, 2011 International Conference on Computer Vision.
[66] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[67] Yee Whye Teh,et al. Names and faces in the news , 2004, CVPR 2004.
[68] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[69] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Li Fei-Fei,et al. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[71] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[72] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[73] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).