DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German

Derivational models are still an underresearched area in computational morphology. Even for German, a rather resourcerich language, there is a lack of largecoverage derivational knowledge. This paper describes a rule-based framework for inducing derivational families (i.e., clusters of lemmas in derivational relationships) and its application to create a highcoverage German resource, DERIVBASE, mapping over 280k lemmas into more than 17k non-singleton clusters. We focus on the rule component and a qualitative and quantitative evaluation. Our approach achieves up to 93% precision and 71% recall. We attribute the high precision to the fact that our rules are based on information from grammar books.

[1]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[2]  Lars Borin,et al.  Unsupervised Learning of Morphology , 2011, CL.

[3]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[4]  Wolfgang Hoeppner Derivative Wortbildung der deutschen Gegenwartssprache und ihre algorithmische Analyse , 1980 .

[5]  Ulrich Heid,et al.  SMOR: A German Computational Morphology Covering Derivation, Composition and Inflection , 2004, LREC.

[6]  Christiane Fellbaum,et al.  Putting Semantics into WordNet's "Morphosemantic" Links , 2009, LTC.

[7]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[8]  Lauri Karttunen,et al.  Two-level rule compiler , 1992 .

[9]  Nizar Habash,et al.  A Categorial Variation Database for English , 2003, NAACL.

[10]  Ralph Grishman,et al.  NOMLEX: a lexicon of nominalizations , 1998 .

[11]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[12]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984 .

[13]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[14]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[15]  Arne Fitschen,et al.  Ein computerlinguistisches Lexikon als komplexes System , 2004 .

[16]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[17]  Victor Kuperman,et al.  Words and paradigms bit by bit: An information‐theoretic approach to the processing of inflection and derivation , 2009 .

[18]  Bernhard Engelen,et al.  Forschungsberichte des Instituts für Deutsche Sprache , 1970 .

[19]  R. Baayen,et al.  Paradigms bit by bit : an information-theoretic approach to the processing of paradigmatic structure in inflection and derivation , 2008 .

[20]  Ulrich Heid,et al.  Design and Application of a Gold Standard for Morphological Analysis: SMOR as an Example of Morphological Evaluation , 2010, LREC.

[21]  Wolfgang Finkler,et al.  MORPHIX A Fast Realization of a Classification-Based Approach to Morphology , 1988 .

[22]  Kathleen McKeown,et al.  Towards Strict Sentence Intersection: Decoding and Evaluation Strategies , 2011, Monolingual@ACL.

[23]  Daniel Jurafsky,et al.  Knowledge-Free Induction of Morphology Using Latent Semantic Analysis , 2000, CoNLL/LLL.

[24]  George W. Adamson,et al.  The use of an association measure based on character structure to identify semantically related pairs of words and document titles , 1974, Inf. Storage Retr..

[25]  Eric Gaussier,et al.  Unsupervised learning of derivational morphology from inflectional lexicons , 1999 .

[26]  Ido Dagan,et al.  Learning Entailment Relations by Global Graph Structure Optimization , 2012, CL.

[27]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[28]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[29]  Maciej Piasecki,et al.  Recognition of Polish Derivational Relations Based on Supervised Learning Scheme , 2012, LREC.

[30]  Pascale Sébillot,et al.  Applications of Computational Morphology , 2002 .

[31]  Orhan Bilgin,et al.  Morphosemantic Relations In and Across Wordnets A Study Based on Turkish , 2004 .

[32]  Christian Simon,et al.  Morphisto - An Open Source Morphological Analyzer for German , 2009, FSMNLP.

[33]  Philip Resnik,et al.  Inducing Frame Semantic Verb Classes from WordNet and LDOCE , 2004, ACL.

[34]  Ralph Grishman,et al.  Annotating Noun Argument Structure for NomBank , 2004, LREC.

[35]  Prasenjit Majumder,et al.  YASS: Yet another suffix stripper , 2007, TOIS.

[36]  Karel Pala,et al.  Derivational Relations in Czech WordNet , 2007, ACL 2007.

[37]  Lauri Karttunen,et al.  Twenty-five years of finite-state morphology , 2005 .

[38]  Lauri Karttunen,et al.  Finite State Morphology , 2003, CSLI Studies in Computational Linguistics.

[39]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[40]  Ido Dagan,et al.  Learning Entailment Rules for Unary Templates , 2008, COLING.

[41]  J. V. Rauff,et al.  Finite State Morphology , 2007 .

[42]  Herv Morphemes as Necessary Concept for Structures Discovery from Untagged Corpora , 1998 .

[43]  Gerhard Augst Lexikon zur Wortbildung , 1975 .