Canonical Correlation Inference for Mapping Abstract Scenes to Text

We describe a technique for structured prediction, based on canonical correlation analysis. Our learning algorithm finds two projections for the input and the output spaces that aim at projecting a given input and its correct output into points close to each other. We demonstrate our technique on a language-vision problem, namely the problem of giving a textual description to an "abstract scene".

[1]  Karl Stratos,et al.  Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.

[2]  Eugene Charniak,et al.  Nonparametric Method for Data-driven Image Captioning , 2014, ACL.

[3]  Frank Keller,et al.  Image Description using Visual Dependency Representations , 2013, EMNLP.

[4]  Benjamin Van Durme,et al.  Multiview LSA: Representation Learning via Generalized CCA , 2015, NAACL.

[5]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[6]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[7]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[8]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[9]  C. Lawrence Zitnick,et al.  Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Vicente Ordonez,et al.  Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.

[11]  Karl Stratos,et al.  A Spectral Algorithm for Learning Class-Based n-gram Models of Natural Language , 2014, UAI.

[12]  Mitesh M. Khapra,et al.  Transliteration Equivalence Using Canonical Correlation Analysis , 2010, ECIR.

[13]  Lucy Vanderwende,et al.  Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Mirella Lapata,et al.  Learning to Interpret and Describe Abstract Scenes , 2015, NAACL.

[15]  Karl Stratos,et al.  Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models , 2016, TACL.

[16]  Shashi Narayan,et al.  Encoding Prior Knowledge with Eigenword Embeddings , 2015, TACL.

[17]  David F. Fouhey,et al.  Predicting Object Dynamics in Scenes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[19]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[20]  Ryan Cotterell,et al.  Weighting Finite-State Transductions With Neural Context , 2016, NAACL.

[21]  Yiannis Aloimonos,et al.  Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.

[22]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[23]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[24]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[25]  Kevin Gimpel,et al.  Deep Multilingual Correlation for Improved Word Embeddings , 2015, NAACL.

[26]  Karl Stratos,et al.  Experiments with Spectral Learning of Latent-Variable PCFGs , 2013, HLT-NAACL.

[27]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[28]  Peter Young,et al.  Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..

[29]  Noah A. Smith,et al.  Semi-Supervised Learning of Sequence Models with Method of Moments , 2016, EMNLP.

[30]  D. Wilks Canonical Correlation Analysis (CCA) , 2011 .

[31]  Yejin Choi,et al.  Collective Generation of Natural Image Descriptions , 2012, ACL.

[32]  Karl Stratos,et al.  Model-based Word Embeddings from Decompositions of Count Matrices , 2015, ACL.

[33]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Hal Daumé,et al.  Regularized Interlingual Projections: Evaluation on Multilingual Transliteration , 2012, EMNLP-CoNLL.

[35]  Tamara L. Berg,et al.  Baby Talk : Understanding and Generating Image Descriptions , 2011 .

[36]  Dean P. Foster,et al.  Eigenwords: spectral word embeddings , 2015, J. Mach. Learn. Res..

[37]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[38]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.