JoBimText Visualizer: A Graph-based Approach to Contextualizing Distributional Similarity

We introduce an interactive visualization component for the JoBimText project. JoBimText is an open source platform for large-scale distributional semantics based on graph representations. First we describe the underlying technology for computing a distributional thesaurus on words using bipartite graphs of words and context features, and contextualizing the list of semantically similar words towards a given sentential context using graphbased ranking. Then we demonstrate the capabilities of this contextualized text expansion technology in an interactive visualization. The visualization can be used as a semantic parser providing contextualized expansions of words in text as well as disambiguation to word senses induced by graph clustering, and is provided as an open source tool.

[1]  Dominic Widdows,et al.  A Graph Model for Unsupervised Lexical Acquisition , 2002, COLING.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[4]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[5]  Z. Harris,et al.  Methods in structural linguistics. , 1952 .

[6]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[7]  Russell G. Schuh,et al.  Ferdinand de Saussure , 1982 .

[8]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[9]  Jennifer Chu-Carroll,et al.  Finding needles in the haystack: Search and candidate generation , 2012, IBM J. Res. Dev..

[10]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[11]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[12]  Iryna Gurevych,et al.  Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation , 2012, COLING.

[13]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[14]  Christian Biemann Co-Occurrence Cluster Features for Lexical Substitutions in Context , 2010, TextGraphs@ACL.

[15]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[18]  Aditya Kalyanpur,et al.  Leveraging Community-Built Knowledge for Type Coercion in Question Answering , 2011, International Semantic Web Conference.

[19]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[20]  Christian Biemann,et al.  Text: now in 2D! A framework for lexical expansion with contextual similarity , 2013, J. Lang. Model..

[21]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[22]  Zellig S. Harris,et al.  Methods in structural linguistics. , 1952 .

[23]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[24]  Chris Biemann,et al.  Exploiting the Leipzig Corpora Collection , 2006 .

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[27]  Erik T. Mueller,et al.  Watson: Beyond Jeopardy! , 2013, Artif. Intell..

[28]  Iryna Gurevych,et al.  Supervised All-Words Lexical Substitution using Delexicalized Features , 2013, NAACL.