Measure for Measure: Parser Cross-fertilization - Towards Increased Component Comparability and Exchange

Over the past few years significant progress was accomplished in efficient processing with wide-coverage HPSG grammars. HPSG-based parsing systems are now available that can process medium-complexity sentences (of ten to twenty words, say) in average parse times equivalent to real (i.e. human reading) time. A large number of engineering improvements in current HPSG systems were achieved through collaboration of multiple research centers and mutual exchange of experience, encoding techniques, algorithms, and even pieces of software. This article presents an approach to grammar and system engineering, termed competence & performance profiling, that makes systematic experimentation and the precise empirical study of system properties a focal point in development. Adapting the profiling metaphor familiar from software engineering to constraint-based grammars and parsers, enables developers to maintain an accurate record of system evolution, identify grammar and system deficiencies quickly, and compare to earlier versions or between different systems. We discuss a number of exemplary problems that motivate the experimental approach, and apply the empirical methodology in a fairly detailed discussion of what was achieved during a development period of three years. Given the collaborative nature in setup, the empirical results we present involve research and achievements of a large group of people.

[1]  Gosse Bouma,et al.  Hdrug. A Flexible and Extendible Development Environment for Natural Language Processing. , 1997 .

[2]  Gertjan van Noord An Efficient Implementation of the Head-Corner Parser , 1997, CL.

[3]  Gregor Erbach,et al.  A Flexible Parser for a Linguistic Development Environment , 1991, Text Understanding in LILOG.

[4]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[5]  Ann A. Copestake,et al.  The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons , 1992, ANLP.

[6]  John A. Carroll Relating Complexity to Practical Performance in Parsing With Wide-Coverage Unification Grammars , 1994, ACL.

[7]  Jun'ichi Tsujii,et al.  Computing Phrasal-signs in HPSG prior to Parsing , 1996, COLING.

[8]  Hideto Tomabechi Quasi-Destructive Graph Unification , 1991, ACL.

[9]  David A. Wroblewski,et al.  Nondestructive Graph Unification , 1987, AAAI.

[10]  Stephan Oepen,et al.  Towards systematic grammar profiling.Test suite technology 10 years after , 1998, Comput. Speech Lang..

[11]  Rob Malouf,et al.  Efficient feature structure operations without compilation , 2000, Natural Language Engineering.

[12]  Hassan Aït-Kaci,et al.  Warren's Abstract Machine: A Tutorial Reconstruction , 1991 .

[13]  John Carroll,et al.  An Efficient Chart Generator for (Semi-)Lexicalist Grammars , 2001 .

[14]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[15]  Hans-Ulrich Krieger,et al.  TDL-A Type Description Language for Constraint-Based Grammars , 1994, COLING.

[16]  Gertjan van Noord,et al.  Head-driven Parsing for Lexicalist Grammars: Experimental Results , 1993, EACL.

[17]  Martin Kay,et al.  Head-Driven Parsing , 1989, IWPT.

[18]  Günter Neumann,et al.  DISCO-An HPSG-based NLP System and its Application for Appointment Scheduling Project Note , 1994, COLING.

[19]  Ulrich Callmeier,et al.  PET – a platform for experimentation with efficient HPSG processing techniques , 2000, Natural Language Engineering.

[20]  Hans-Ulrich Krieger,et al.  A Bag of Useful Techniques for Efficient and Robust Parsing , 1999, ACL.