Parser engineering and performance profiling

We describe and argue for a strategy of performance profiling and comparison in the engineering of parsing systems for wide-coverage linguistic grammars. A performance profile is a precise, rich and structured snapshot of system (and grammar) behaviour at a given development point. The aim is to characterize system performance at a very detailed technical level, but at the same time to abstract away from idiosyncracies of particular processors. Profiles are obtained with minimal effort by applying a specialized profiling tool to a set of structured reference data (taken from both existing test suites and corpora), in conjunction with a uniform format for test data and processing results. The resulting profiles can be analyzed and visualized at various levels of granularity in order to highlight different aspects of system performance, thus providing a solid empirical basis for system refinement and optimization. Since profiles are stored in a database, comparison with earlier versions, different parameter settings, or other processing systems is straightforward. We apply several salient performance metrics in a contrastive discussion of various (one-pass, bottom-up, chart-based) parsing strategies (viz. passive vs. active and uni- vs. bidirectional approaches). Based on insights gained from detailed performance profiles, we outline and evaluate a novel ‘hyper-active’ parsing strategy. We also present preliminary profiles for techniques for ‘packing’ of local ambiguities with respect to (partial) subsumption of feature structures.

[1]  Ann A. Copestake,et al.  The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons , 1992, ANLP.

[2]  John A. Carroll Relating Complexity to Practical Performance in Parsing With Wide-Coverage Unification Grammars , 1994, ACL.

[3]  H. Alshawi,et al.  The Core Language Engine , 1994 .

[4]  Fernando Pereira,et al.  A Structure-Sharing Representation for Unification-Based Grammar Formalisms , 1985, ACL.

[5]  Ronald M. Kaplan,et al.  A Method for Disjunctive Constraint Satisfaction , 1991 .

[6]  Hideto Tomabechi Quasi-Destructive Graph Unification , 1991, ACL.

[7]  Peter Norvig,et al.  Verbmobih A Translation System for Face-to-Face Dialog , 1994 .

[8]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[9]  Gertjan van Noord An Efficient Implementation of the Head-Corner Parser , 1997, CL.

[10]  Stephan Oepen,et al.  Ambiguity Packing in Constraint-based Parsing Practical Results , 2000, ANLP.

[11]  Hans-Ulrich Krieger,et al.  A Bag of Useful Techniques for Efficient and Robust Parsing , 1999, ACL.

[12]  Stuart M. Shieber,et al.  Using Restriction to Extend Parsing Algorithms for Complex-Feature-Based Formalisms , 1985, ACL.

[13]  Gertjan van Noord,et al.  Head-driven Parsing for Lexicalist Grammars: Experimental Results , 1993, EACL.

[14]  Martin Kay,et al.  Head-Driven Parsing , 1989, IWPT.

[15]  Stephan Oepen,et al.  Towards systematic grammar profiling.Test suite technology 10 years after , 1998, Comput. Speech Lang..

[16]  Bob Carpenter,et al.  ALE for speech: a translation prototype , 1999, EUROSPEECH.

[17]  Gosse Bouma,et al.  Hdrug. A Flexible and Extendible Development Environment for Natural Language Processing. , 1997 .