Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding

In this paper, we introduce an application of matrix factorization to produce corpus-derived, distributional models of semantics that demonstrate cognitive plausibility. We find that word representations learned by Non-Negative Sparse Embedding (NNSE), a variant of matrix factorization, are sparse, effective, and highly interpretable. To the best of our knowledge, this is the first approach which yields semantic representation of words satisfying these three desirable properties. Though extensive experimental evaluations on multiple real-world tasks and datasets, we demonstrate the superiority of semantic models learned by NNSE over other state-of-the-art baselines.

[1]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[2]  W. Montague,et al.  Category norms of verbal items in 56 categories A replication and extension of the Connecticut category norms , 1969 .

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[5]  Hinrich Schütze,et al.  A Vector Model for Syntagmatic and Paradigmatic Relatedness , 1993 .

[6]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[7]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[8]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[9]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[10]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[11]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  M. L. Lambon Ralph,et al.  Prototypicality, distinctiveness, and intercorrelation: Analyses of the semantic attributes of living and nonliving concepts , 2001, Cognitive neuropsychology.

[14]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[15]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[16]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[17]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[18]  R. Rapp Word sense discovery based on sense descriptor dissimilarity , 2003, MTSUMMIT.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Massimo Poesio,et al.  Attribute-Based and Value-Based Clustering: An Evaluation , 2004, EMNLP.

[21]  Nancy Ide,et al.  The American National Corpus First Release , 2004, LREC.

[22]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[23]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[24]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[25]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[26]  Joakim Nivre,et al.  Single Malt or Blended? A Study in Multilingual Parser Optimization , 2007, EMNLP.

[27]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[28]  David P Vinson,et al.  Semantic feature production norms for a large set of objects and events , 2008, Behavior research methods.

[29]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[30]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[31]  Roger B. Bradford,et al.  An empirical study of required dimensionality for large-scale latent semantic indexing applications , 2008, CIKM '08.

[32]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[33]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[34]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[35]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[36]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[37]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[38]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[39]  Massimo Poesio,et al.  Strudel: A Corpus-Based Semantic Model Based on Properties and Types , 2010, Cogn. Sci..

[40]  Noah A. Smith,et al.  Proceedings of NIPS , 2010, NIPS 2010.

[41]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[42]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[43]  Joseph P. Levy,et al.  USING ENRICHED SEMANTIC REPRESENTATIONS IN PREDICTIONS OF HUMAN BRAIN ACTIVITY , 2011 .

[44]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[45]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[46]  Tom M. Mitchell,et al.  Selecting Corpus-Semantic Models for Neurolinguistic Decoding , 2012, *SEMEVAL.