Distributed Representations of Geographically Situated Language

We introduce a model for incorporating contextual information (such as geography) in learning vector-space representations of situated language. In contrast to approaches to multimodal representation learning that have used properties of the object being described (such as its color), our model includes information about the subject (i.e., the speaker), allowing us to learn the contours of a word’s meaning that are shaped by the context in which it is uttered. In a quantitative evaluation on the task of judging geographically informed semantic similarity between representations learned from 1.1 billion words of geo-located tweets, our joint model outperforms comparable independent models that learn meaning in isolation.

[1]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[2]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[3]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[4]  David Bamman,et al.  Gender identity and lexical variation in social media , 2012, 1210.4567.

[5]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[6]  Gabriel Doyle,et al.  Mapping Dialectal Variation by Querying Social Media , 2014, EACL.

[7]  Alexander H. Waibel,et al.  Multimodal interfaces , 1996, Artificial Intelligence Review.

[8]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[9]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[10]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[11]  Michiaki Tatsubori,et al.  Location inference using microblog messages , 2012, WWW.

[12]  Brendan T. O'Connor,et al.  Diffusion of Lexical Change in Social Media , 2012, PloS one.

[13]  Nicu Sebe,et al.  Distributional semantics with eyes: using image analysis to improve computational representations of word meaning , 2012, ACM Multimedia.

[14]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[15]  Gabriella Vigliocco,et al.  Integrating experiential and distributional data to learn semantic representations. , 2009, Psychological review.

[16]  Chen Yu,et al.  A multimodal learning interface for grounding spoken language in sensory perceptions , 2003, ICMI '03.

[17]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[18]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[19]  Yansong Feng,et al.  Visual Information in Semantic Representation , 2010, NAACL.

[20]  Elia Bruni,et al.  Distributional semantics from text and images , 2011, GEMS.

[21]  Brendan T. O'Connor,et al.  Discovering Demographic Language Variation , 2010 .

[22]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[23]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[26]  J. Preece,et al.  The Human-Computer Interaction Handbook , 2003 .

[27]  Mary Bucholtz,et al.  Word Up: Social Meanings of Slang in California Youth Culture , 2012 .

[28]  Ana-Maria Popescu,et al.  Democrats, republicans and starbucks afficionados: user classification in twitter , 2011, KDD.

[29]  Sabine Schulte im Walde,et al.  A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities , 2013, EMNLP.

[30]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[31]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[32]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..