Building Web-Interfaces for Vector Semantic Models with the WebVectors Toolkit

We present WebVectors, a toolkit that facilitates using distributional semantic models in everyday research. Our toolkit has two main features: it allows to build web interfaces to query models using a web browser, and it provides the API to query models automatically. Our system is easy to use and can be tuned according to individual demands. This software can be of use to those who need to work with vector semantic models but do not want to develop their own interfaces, or to those who need to deliver their trained models to a large audience. WebVectors features vi- sualizations for various kinds of semantic queries. For the present moment, the web services with Russian, English and Norwegian models are available, built using WebVectors.

[1]  Iomdin Boris Leonidovich,et al.  Word Sense Frequency of Similar Polysemous Words in Different Languages , 2016 .

[2]  Natalia V. Loukachevitch,et al.  Gathering Information About Word Similarity from Neighbor Sentences , 2016, TSD.

[3]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[4]  Andrey Kutuzov,et al.  Texts in, meaning out: neural language models in semantic similarity task for Russian , 2015, ArXiv.

[5]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Knut Hofland A Self-Expanding Corpus Based on Newspapers on the Web , 2000, LREC.

[8]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[9]  Scharolta Katharina Siencnik Adapting word2vec to Named Entity Recognition , 2015, NODALIDA.

[10]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[11]  Andrey Kutuzov,et al.  Comparing Neural Lexical Models of a Classic National Corpus and a Web Corpus: The Case for Russian , 2015, CICLing.

[12]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[13]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[14]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[15]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[16]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[17]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Andrey Kutuzov,et al.  WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models , 2016, AIST.

[20]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.