Measure for measure: towards increased component comparability and exchange

Over the past few years, significant progress has been made in efficient processing with wide-coverage HPSG grammars. HPSG-based parsing systems are now available that can process medium-complexity sentences (of ten to twenty words, say) in average parse times equivalent to real (i.e. human reading) time. A large number of engineering improvements in current HPSG systems have been achieved through collaboration of multiple research centers and mutual exchange of experience, encoding techniques, algorithms, and even pieces of software. This article presents an approach to grammar and system engineering, termed competence & performance profiling, that makes systematic experimentation and the precise empirical study of system properties a focal point in development. Adapting the profiling metaphor familiar from software engineering to constraint-based grammars and parsers enables developers to maintain an accurate record of system evolution, identify grammar and system deficiencies quickly, and compare to earlier versions or between different systems. We discuss a number of example problems that motivate the experimental approach, and apply the empirical methodology in a fairly detailed discussion of progress made during a development period of three years.

[1]  Hans-Ulrich Krieger,et al.  TDL-A Type Description Language for Constraint-Based Grammars , 1994, COLING.

[2]  Gertjan van Noord,et al.  Head-driven Parsing for Lexicalist Grammars: Experimental Results , 1993, EACL.

[3]  Ulrich Callmeier,et al.  PET – a platform for experimentation with efficient HPSG processing techniques , 2000, Natural Language Engineering.

[4]  Melanie Siegel,et al.  HPSG Analysis of Japanese , 2000 .

[5]  Jun'ichi Tsujii,et al.  Computing Phrasal-signs in HPSG prior to Parsing , 1996, COLING.

[6]  Hans-Ulrich Krieger,et al.  A Bag of Useful Techniques for Efficient and Robust Parsing , 1999, ACL.

[7]  Martin Kay,et al.  Head-Driven Parsing , 1989, IWPT.

[8]  Hassan Aït-Kaci,et al.  Warren's Abstract Machine: A Tutorial Reconstruction , 1991 .

[9]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[10]  Hideto Tomabechi Quasi-Destructive Graph Unification , 1991, ACL.

[11]  Gregor Erbach,et al.  A Flexible Parser for a Linguistic Development Environment , 1991, Text Understanding in LILOG.

[12]  Otthein Herzog,et al.  Text Understanding in LILOG , 1991, Lecture Notes in Computer Science.

[13]  David A. Wroblewski,et al.  Nondestructive Graph Unification , 1987, AAAI.

[14]  Dan Flickinger,et al.  On building a more effcient grammar by exploiting types , 2000, Natural Language Engineering.

[15]  Günter Neumann,et al.  DISCO-An HPSG-based NLP System and its Application for Appointment Scheduling Project Note , 1994, COLING.

[16]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[17]  Ann A. Copestake,et al.  The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons , 1992, ANLP.

[18]  John Carroll,et al.  An Efficient Chart Generator for (Semi-)Lexicalist Grammars , 2001 .

[19]  Stephan Oepen,et al.  Parser engineering and performance profiling , 2000, Natural Language Engineering.

[20]  Stephan Oepen,et al.  Collaborative language engineering : a case study in efficient grammar-based processing , 2002 .

[21]  Dale Gerdemann,et al.  Term Encoding of Typed Feature Structures , 1995, IWPT.

[22]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[23]  Gertjan van Noord An Efficient Implementation of the Head-Corner Parser , 1997, Comput. Linguistics.

[24]  Stephan Oepen,et al.  Towards systematic grammar profiling.Test suite technology 10 years after , 1998, Comput. Speech Lang..

[25]  Rob Malouf,et al.  Efficient feature structure operations without compilation , 2000, Natural Language Engineering.

[26]  John A. Carroll Relating Complexity to Practical Performance in Parsing With Wide-Coverage Unification Grammars , 1994, ACL.

[27]  Gosse Bouma,et al.  Hdrug. A Flexible and Extendible Development Environment for Natural Language Processing. , 1997, Workshop On Computational Environments For Grammar Development And Linguistic Engineering.