A Mixture Model with Sharing for Lexical Semantics

We introduce tiered clustering, a mixture model capable of accounting for varying degrees of shared (context-independent) feature structure, and demonstrate its applicability to inferring distributed representations of word meaning. Common tasks in lexical semantics such as word relatedness or selectional preference can benefit from modeling such structure: Polysemous word usage is often governed by some common background metaphoric usage (e.g. the senses of line or run), and likewise modeling the selectional preference of verbs relies on identifying commonalities shared by their typical arguments. Tiered clustering can also be viewed as a form of soft feature selection, where features that do not contribute meaningfully to the clustering can be excluded. We demonstrate the applicability of tiered clustering, highlighting particular cases where modeling shared structure is beneficial and where it can be detrimental.

[1]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[2]  Thomas L. Griffiths,et al.  A more rational model of categorization , 2006 .

[3]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[4]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[5]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[6]  Marco Baroni,et al.  BagPack: A General Framework to Represent Semantic Relations , 2009, ArXiv.

[7]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[8]  D. Aldous Exchangeability and related topics , 1985 .

[9]  James R. Curran,et al.  Scaling Distributional Similarity to Large Corpora , 2006, ACL.

[10]  W. Lowe,et al.  Towards a Theory of Semantic Space , 2001 .

[11]  Nianwen Xue,et al.  Aligning Features with Sense Distinction Dimensions , 2006, ACL.

[12]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[13]  Patrick Pantel,et al.  Clustering by committee , 2003 .

[14]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[15]  Patrick Pantel,et al.  ISP: Learning Inferential Selectional Preferences , 2007, NAACL.

[16]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[17]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[18]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[19]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[20]  Katrin Erk,et al.  Flexible, Corpus-Based Modelling of Human Plausibility Judgements , 2007, EMNLP.

[21]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[22]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[23]  Perry R. Cook,et al.  Speaking through pictures: images vs. icons , 2009, Assets '09.

[24]  Diana McCarthy,et al.  Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences , 2003, CL.

[25]  Ulrike Padó,et al.  The integration of syntax and semantic plausibility in a wide-coverage model of human sentence processing , 2007 .

[26]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[27]  Stephen Clark,et al.  Class-Based Probability Estimation Using a Semantic Hierarchy , 2002, CL.

[28]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[29]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[30]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[31]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[32]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[33]  Vikash K. Mansinghka,et al.  Learning Cross-cutting Systems of Categories , 2006 .

[34]  Philip Resnik,et al.  Selectional Preference and Sense Disambiguation , 1997 .

[35]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[36]  Anil K. Jain,et al.  Feature Selection in Mixture-Based Clustering , 2002, NIPS.

[37]  Benjamin Van Durme,et al.  Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction , 2008, AAAI.