The Gavagai Living Lexicon

This paper presents the Gavagai Living Lexicon, which is an online distributional semantic model currently available in 20 different languages. We describe the underlying distributional semantic model, and how we have solved some of the challenges in applying such a model to large amounts of streaming data. We also describe the architecture of our implementation, and discuss how we deal with continuous quality assurance of the lexicon.

[1]  Ashwin Lall,et al.  Streaming Pointwise Mutual Information , 2009, NIPS.

[2]  Dominic Widdows,et al.  Orthogonal Negation in Vector Spaces for Modelling Word-Meanings and Document Retrieval , 2003, ACL.

[3]  Pentti Kanerva,et al.  Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors , 2009, Cognitive Computation.

[4]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[5]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[6]  Adam Kilgarriff,et al.  The Sketch Engine: ten years on , 2014 .

[7]  Kevyn Collins-Thompson,et al.  Evaluating Learning Language Representations , 2015, CLEF.

[8]  Magnus Sahlgren,et al.  Factorization of Latent Variables in Distributional Semantic Models , 2015, EMNLP.

[9]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[10]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[11]  Magnus Sahlgren,et al.  Navigating the Semantic Horizon using Relative Neighborhood Graphs , 2015, EMNLP.

[12]  P. Kanerva,et al.  Permutations as a means to encode order in word space , 2008 .

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[15]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[16]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .