Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections
暂无分享,去创建一个
Svetlana Lazebnik | Liwei Wang | Julia Hockenmaier | Micah Hodosh | Yunchao Gong | S. Lazebnik | Yunchao Gong | Liwei Wang | M. Hodosh | J. Hockenmaier | Micah Hodosh
[1] H. Hotelling. Relations Between Two Sets of Variates , 1936 .
[2] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[3] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.
[4] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.
[5] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[6] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[7] E. Loper,et al. NLTK: The Natural Language Toolkit , 2006, ACL 2006.
[8] Trevor Darrell,et al. Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[9] Alexei A. Efros,et al. Scene completion using millions of photographs , 2007, SIGGRAPH 2007.
[10] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[11] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[12] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[14] Derek Hoiem,et al. Building text features for object image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Liang Lin,et al. I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.
[16] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[18] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[19] Trevor Darrell,et al. Adapting Visual Category Models to New Domains , 2010, ECCV.
[20] Thomas Deselaers,et al. ClassCut for Unsupervised Class Segmentation , 2010, ECCV.
[21] Cordelia Schmid,et al. Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[22] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[23] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[24] Alexei A. Efros,et al. Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..
[25] Jason Weston,et al. WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.
[26] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[27] Tamara L. Berg,et al. Baby Talk: Understanding and Generating Image Descriptions , 2011 .
[28] Rama Chellappa,et al. Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.
[29] Yejin Choi,et al. Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.
[30] Ernest Valveny,et al. Leveraging category-level labels for instance-level image retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[31] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[32] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[33] Kilian Q. Weinberger,et al. From sBoW to dCoT marginalized encoders for text representation , 2012, CIKM '12.
[34] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[35] Matthieu Guillaumin,et al. Large-scale knowledge transfer for object localization in ImageNet , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[36] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[37] Sanja Fidler,et al. A Sentence Is Worth a Thousand Pixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[38] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.
[39] Michael Isard,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.
[40] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[41] Kristen Grauman,et al. Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.
[42] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.
[43] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[44] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.