Efficient Parsing with Large-Scale Unification Grammars

The efficiency problem in parsing with large-scale unification grammars, including implementations in the Head-driven Phrase Structure grammar (HPSG) framework, used to be a serious obstacle to their application in research and commercial settings. Over the past few years, however, significant progress in efficient processing has been achieved. Still, many of the proposed techniques were developed in isolation only, making comparison and the assessment of their combined potential difficult. Also, a number of techniques were never evaluated on large-scale grammars. This thesis sets out to improve this situation by reviewing, integrating, and evaluating a number of techniques for efficient unification-based parsing. A strong focus is set on efficient graph unification. I provide an overview of previous work in this area of research, including the foundational algorithm in the work of Wroblewski (1987), for which I identify a previously unnoticed flaw, and provide a solution. I introduce the PET platform, which has been developed with two goals: (i) to serve as a flexible basis for research in efficient processing techniques, allowing precise empirical study and comparison of different approaches, and (ii) to provide an efficient run-time processor that supports fruitful scientific and practical utilization of HPSG grammars. The design and implementation of PET is presented in detail, including a closer look at efficient semi-lattice computation in the preprocessor. A number of experiments with PET are discussed, using three existing large-scale HPSG grammars of English, Japanese, and German. I give precise empirical answers to some open research questions, most importantly the question of feature structure encoding (lists of feature-value pairs versus representations based on fixed arity), and show that this is a much less important factor than often assumed. I also address the question of predicting practical performance across grammars and processing platforms. Finally, I take a wider perspective and report on the overall improvement of processing performance for HPSG grammars (as exemplified by the LinGO grammar) that has been achieved over a period of four years by an international consortium of research groups.

[1]  Karel Driesen,et al.  Software and Hardware Techniques for Efficient Polymorphic Calls , 1999 .

[2]  Stefan Müller,et al.  Deutsche Syntax deklarativ: Head-Driven Phrase Structure Grammar für das Deutsche , 1999 .

[3]  Hideto Tomabechi Quasi-Destructive Graph Unification with Structure-Sharing , 1992, COLING.

[4]  Jun'ichi Tsujii,et al.  Computing Phrasal-signs in HPSG prior to Parsing , 1996, COLING.

[5]  Carl Pollard,et al.  Parsing Head-Driven Phrase Structure Grammar , 1985, ACL.

[6]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[7]  Patrick Lincoln,et al.  Efficient implementation of lattice operations , 1989, TOPL.

[8]  S BoyerRoger,et al.  Ttle sharing of structure in theorem proving programs , 1972 .

[9]  Hideto Tomabechi Quasi-Destructive Graph Unification , 1991, ACL.

[10]  Gregor Erbach,et al.  ProFIT: Prolog with Features, Inheritance and Templates , 1995, EACL.

[11]  Bob Carpenter,et al.  ALE : the attribute logic engine user's guide, version 2.0.1 , 1992 .

[12]  Bryan Carpenter,et al.  An Abstract Machine for Attribute-Value Logics , 1995, IWPT.

[13]  Mark-Jan Nederhof,et al.  Efficient and Robust Parsing of Word Hypotheses Graphs , 2000 .

[14]  Gregor Erbach,et al.  A Bottom-Up Algorithm for Parsing and Generation , 2000 .

[15]  Stephan Oepen,et al.  Collaborative language engineering : a case study in efficient grammar-based processing , 2002 .

[16]  Dale Gerdemann,et al.  Term Encoding of Typed Feature Structures , 1995, IWPT.

[17]  Hassan Aït-Kaci,et al.  Warren's Abstract Machine: A Tutorial Reconstruction , 1991 .

[18]  Stephan Oepen,et al.  Measure for Measure: Parser Cross-fertilization - Towards Increased Component Comparability and Exchange , 2000, IWPT.

[19]  Hideto Tomabechi,et al.  Signature‐check based unification filter , 1994 .

[20]  Booncharoen Sirinaovakul,et al.  Introduction to the Special Issue , 2002, Comput. Intell..

[21]  Hideto Tomabechi,et al.  Design of Efficient Unification for Natural Language , 1995 .

[22]  Stephan Oepen,et al.  Towards systematic grammar profiling.Test suite technology 10 years after , 1998, Comput. Speech Lang..

[23]  Liviu Ciortuz Scaling up the Abstract Machine for Unification of OSF-Terms to do Head-Corner Parsing with Large-Scale Typed Unification Grammars , 2000, WLP.

[24]  Kurt Godden,et al.  Lazy Unification , 1990, ACL.

[25]  Ulrich Callmeier,et al.  PET – a platform for experimentation with efficient HPSG processing techniques , 2000, Natural Language Engineering.

[26]  David A. Wroblewski,et al.  Nondestructive Graph Unification , 1987, AAAI.

[27]  Jun'ichi Tsujii,et al.  LiLFes - Towards a Practical HPSG Parser , 1998, COLING-ACL.

[28]  Dan Flickinger,et al.  An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG , 2000, LREC.

[29]  Gerald Penn,et al.  The Algebraic Structure of Attributed Type Signatures , 2000 .

[30]  Nissim Francez,et al.  Abstract Machine for Typed Feature Structures , 1995, ArXiv.

[31]  Bob Carpenter,et al.  Compiling Typed Attribute-Value Logic Grammars , 1993, IWPT.

[32]  Stuart M. Shieber,et al.  Using Restriction to Extend Parsing Algorithms for Complex-Feature-Based Formalisms , 1985, ACL.

[33]  Hans-Ulrich Krieger,et al.  TDL-A Type Description Language for Constraint-Based Grammars , 1994, COLING.

[34]  Gertjan van Noord,et al.  Head-driven Parsing for Lexicalist Grammars: Experimental Results , 1993, EACL.

[35]  N. Curteanu Book Reviews: Lecture on Contemporary Syntactic Theories: An Introduction to Unification-Based Approaches to Grammar , 1987, CL.

[36]  John T. Maxwell,et al.  Formal issues in lexical-functional grammar , 1998 .

[37]  Thilo Gotz A Normal Form For Typed Feature Structures , 1994 .

[38]  Martin Kay,et al.  Head-Driven Parsing , 1989, IWPT.

[39]  Martin Kay,et al.  Algorithm schemata and data structures in syntactic processing , 1986 .

[40]  Stefan Müller,et al.  HPSG Analysis of German , 2000 .

[41]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[42]  Peter Van Roy,et al.  The Wonder Years of Sequential Prolog Implementation , 1996 .

[43]  Peter Van Roy,et al.  1983-1993: The Wonder Years of Sequential Prolog Implementation , 1994, J. Log. Program..

[44]  Lorna Balkan,et al.  Test Suites for Natural Language Processing , 1995, TC.

[45]  David H. D. Warren,et al.  Applied logic : its use and implementation as a programming tool , 1978 .

[46]  Evgeniy Gabrilovich,et al.  Amalia - A Unified Platform for Parsing and Generation , 1997, ArXiv.

[47]  Martin C. Emele Unification with Lazy Non-Redundant Copying , 1991, ACL.

[48]  Ulrich Schäfer,et al.  Efficient Parameterizable Type Expansion for Typed Feature Formalisms , 1995, IJCAI.

[49]  Martin C. Emele,et al.  Typed Unification Grammars , 1990, COLING.

[50]  Stephan Oepen,et al.  Ambiguity Packing in Constraint-based Parsing Practical Results , 2000, ANLP.

[51]  Rob Malouf,et al.  Efficient feature structure operations without compilation , 2000, Natural Language Engineering.

[52]  Dan Flickinger,et al.  HPSG Analysis of English , 2000 .

[53]  FlickingerDan On building a more efficient grammar by exploiting types , 2000 .

[54]  Kiyoshi Kogure,et al.  Strategic Lazy Incremental Copy Graph Unification , 1990, COLING.

[55]  Seth Copen Goldstein,et al.  Order Sorted Feature Theory Unification , 1993, J. Log. Program..

[56]  Ivan A. Sag,et al.  Information-Based Syntax and Semantics: Volume 1, Fundamentals , 1987 .

[57]  Günter Neumann,et al.  DISCO-An HPSG-based NLP System and its Application for Appointment Scheduling Project Note , 1994, COLING.

[58]  Hans-Ulrich Krieger,et al.  A Bag of Useful Techniques for Efficient and Robust Parsing , 1999, ACL.

[59]  G. Kurtz,et al.  Deutsche Syntax deklarativ. Head- Driven Phrase Structure Grammar für das Deutsche , 2001 .

[60]  Gregor Erbach,et al.  A Flexible Parser for a Linguistic Development Environment , 1991, Text Understanding in LILOG.

[61]  Martin C. Emele,et al.  A Fixed-Point Semantics for Feature Type Systems , 1990, CTRS.

[62]  Marcel P. van Lohuizen Memory-Efficient and Thread-Safe Quasi-Destructive Graph Unification , 2000, ACL.

[63]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[64]  John A. Carroll Relating Complexity to Practical Performance in Parsing With Wide-Coverage Unification Grammars , 1994, ACL.

[65]  Jun'ichi Tsujii,et al.  The LiLFeS Abstract Machine and its evaluation with the LinGO grammar , 2000, Nat. Lang. Eng..

[66]  Lauri Karttunen,et al.  D-PATR: A Development Environment for Unification-Based Grammars , 1986, COLING.

[67]  Dan Flickinger,et al.  Structure-Sharing in Lexical Representation , 1985, ACL.

[68]  Jun'ichi Tsujii,et al.  An HPSG parser with CFG filtering , 2000, Nat. Lang. Eng..

[69]  Hans-Ulrich Krieger,et al.  A Context-free Approximation of Head-driven Phrase Structure Grammar , 2000, IWPT.

[70]  Dan Flickinger,et al.  On building a more effcient grammar by exploiting types , 2000, Natural Language Engineering.

[71]  Jun'ichi Tsujii,et al.  Translating the XTAG English grammar to HPSG , 1998, TAG+.

[72]  Ann A. Copestake,et al.  The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons , 1992, ANLP.

[73]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[74]  Christian Schulte Comparing Trailing and Copying for Constraint Programming , 1999, ICLP.

[75]  Lauri Karttunen,et al.  Structure Sharing with Binary Trees , 1985, ACL.

[76]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[77]  Shuly Wintner,et al.  An Abstract Machine for Unification Grammars , 1997, cmp-lg/9709013.

[78]  Hans Uszkoreit,et al.  Strategies for Adding Control Information to Declarative Grammars , 1991, ACL.

[79]  Ann A. Copestake,et al.  Appendix: Definitions of typed feature structures , 2000, Natural Language Engineering.

[80]  Melanie Siegel,et al.  HPSG Analysis of Japanese , 2000 .

[81]  John C. Brown,et al.  Compilation versus abstract machines for fast parsing of typed feature structure grammars , 2000, Future Gener. Comput. Syst..

[82]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[83]  Gertjan van Noord An Efficient Implementation of the Head-Corner Parser , 1997, CL.

[84]  Fernando Pereira,et al.  A Structure-Sharing Representation for Unification-Based Grammar Formalisms , 1985, ACL.