Towards Efficient HPSG Generation for German, a Non-Configurational Language

In this paper, we propose a rule-based method to improve efficiency in bottom-up chart generation with GG, an open-source reversible large-scale HPSG for German. Following an indepth analysis of efficiency problems in the baseline system, we show that costly combinatorial explosion in brute force bottom-up search can be largely avoided using information already contained implicitly in the input semantics: either (i) information is globally present, but needs to be made locally available to a particular elementary predication, or (ii) semantic configurations in the input have a clear translation to syntactic constraints, provided some knowledge of the grammar. We propose several performance features targeting inflection and extraction, as well as more language-specific features, relating to verb movement and discontinuous complex predicates. In a series of experiments on three different test suites we show that 7 out of 8 features are consistently effective in reducing generation times, both in isolation and in combination. Combining all efficiency measures, we observe a speedup factor of 4.5 for our less complex test suites, increasing to almost 28 for the more complex one: the fact that performance benefits drastically increase with input length suggests that our method scales up well in the sense that it effectively heads off the problem with exponential growth. The present approach of using a generator-internal transfer grammar has the added advantage that it locates performance-related issues close to the grammar, thereby keeping the external semantic interface as general as possible. TITLE AND ABSTRACT IN GERMAN Effiziente HPSG-Generierung fur das Deutsche Wir stellen eine regelbasierte Methode vor, zur automatischen Anreicherung der semantischen Eingabe einer reversiblen HPSG des Deutschen, die es erlaubt, teure uninformierte Suche bei der Bottom-Up-Chart-Generierung weitgehend zu vermeiden, indem (i) globale Information, die implizit in der Eingabe vorhanden ist, explizit und lokal verfugbar gemacht wird, und (ii) syntaktische Constraints aus semantischen Konfigurationen abgeleitet werden. Wir schlagen Performanzfeatures fur verschiedene Phanomene vor, wie Flexion, Extraktion, Verbbewegung und diskontuierliche komplexe Pradikate. Unsere Experimente zeigen erhebliche Effizienzsteigerungen (Faktor 4.5–Faktor 27.8), deren Zunahme mit steigender Eingabekomplexitat korreliert, was die gute Skalierbarkeit unserer Methode belegt. Der generator-interne Transferansatz zeichnet sich weiterhin dadurch aus, das Performanzaspekte grammatik-nah behandelt werden, wodurch die externe Semantikschnittstelle so allgemein wie moglich bleibt.

[1]  Stephan Oepen,et al.  High Efficiency Realization for a Wide-Coverage Unification Grammar , 2005, IJCNLP.

[2]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[3]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[4]  Ann Copestake,et al.  Implementing typed feature structure grammars , 2001, CSLI lecture notes series.

[5]  Stefan Müller,et al.  Continuous or Discontinuous Constituents? A Comparison between Syntactic Analyses for Constituent Order and Their Processing Systems , 2004 .

[6]  Stephan Oepen Competence and performance profiling for constraint-based grammars: a new methodology, toolkit, and applications , 2002 .

[7]  Hans-Ulrich Krieger,et al.  A Context-free Approximation of Head-driven Phrase Structure Grammar , 2000, IWPT.

[8]  Martin Kay,et al.  Chart Generation , 1996, ACL.

[9]  Tilman Becker,et al.  Adapting HPSG-to-TAG compilation to wide-coverage grammars , 2000, TAG+.

[10]  Christopher D. Manning,et al.  LinGO Redwoods A Rich and Dynamic Treebank for HPSG , 2002 .

[11]  Stephan Oepen,et al.  Som å kapp-ete med trollet? Towards MRS-based Norwegian-English machine translation , 2004 .

[12]  Birgit Wesche,et al.  Verb Order and Head Movement , 1991, Text Understanding in LILOG.

[13]  Berthold Crysmann,et al.  Relative Clause Extraposition in German: An Efficient and Portable Implementation , 2005 .

[14]  Bernd Kiefer,et al.  Compilation of HPSG to TAG , 1995, ACL.

[15]  Stefan Müller,et al.  HPSG Analysis of German , 2000 .

[16]  Claire Gardent,et al.  Generating and Selecting Grammatical Paraphrases , 2005, ENLG.

[17]  Jan Tore Lønning,et al.  Towards hybrid quality-oriented machine translation – on linguistics and probabilities in MT , 2007, TMI.

[18]  Christian Rohrer,et al.  Improving coverage and parsing quality of a large-scale LFG for German , 2006, LREC.

[19]  Claire Gardent,et al.  A Symbolic Approach to Near-Deterministic Surface Realisation using Tree Adjoining Grammar , 2007, ACL.

[20]  Sina Zarrieß,et al.  REVERSING F-STRUCTURE REWRITING FOR GENERATION FROM MEANING REPRESENTATIONS , 2010 .

[21]  Dan Flickinger,et al.  An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG , 2000, LREC.

[22]  Berthold Crysmann Local Ambiguity Packing and Discontinuity in German , 2007, ACL 2007.

[23]  Aoife Cahill,et al.  Stochastic Realisation Ranking for a Free Word Order Language , 2007, ENLG.

[24]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[25]  Jun'ichi Tsujii,et al.  Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator , 2005, IWPT.