A distributional memory for German

This paper describes the creation of a Distributional Memory (Baroni and Lenci 2010) resource for German. Distributional Memory is a generalized distributional resource for lexical semantics that does not have to commit to a particular vector space at the time of creation. We induce a resource from a German corpus, following the original design decisions as closely as possible, and discuss the steps necessary for a new language. We evaluate the German DM model on a synonym selection task, finding that it can compete with existing models.

[1]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[2]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[3]  Sabine Schulte im Walde Experiments on the Automatic Induction of German Semantic Verb Classes , 2006, CL.

[4]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[5]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[6]  Regina Barzilay,et al.  Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches , 2009, J. Artif. Intell. Res..

[7]  Yves Peirsman,et al.  Semantic relations in bilingual lexicons , 2011, TSLP.

[8]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[9]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[10]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[11]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[12]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[13]  Katrin Erk,et al.  A Flexible, Corpus-Driven Model of Regular and Inverse Selectional Preferences , 2010, CL.

[14]  Pascale Fung,et al.  BiFrameNet: Bilingual Frame Semantics Resource Construction by Cross-lingual Induction , 2004, COLING.

[15]  Suzanne Stevenson,et al.  A General Feature Space for Automatic Verb Classification , 2003, EACL.

[16]  Adam Kilgarriff,et al.  Large Linguistically-Processed Web Corpora for Multiple Languages , 2006, EACL.

[17]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[18]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[19]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[20]  Graeme Hirst,et al.  Cross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance , 2007, EMNLP.

[21]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[22]  Emanuele Pianta,et al.  Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus , 2005, Natural Language Engineering.

[23]  Ulrich Heid,et al.  Design and Application of a Gold Standard for Morphological Analysis: SMOR as an Example of Morphological Evaluation , 2010, LREC.

[24]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[25]  Andrew Y. Ng,et al.  Learning random walk models for inducing word dependency distributions , 2004, ICML.