We describe a new approach to syntactic generation with Head-Driven Phrase Structure Grammars (HPSG) that uses an extensive off–line preprocessing step. Direct generation algorithms apply the phrase-structure rules (schemata) of the grammar on–line which is an computationally expensive step. Instead, we collect off-line for every lexical type of the HPSG grammar all minimally complete projections (called elementary trees) that can be derived with the schemata. This process is known as ‘compiling HPSG to TAG’ and derives a Lexicalized Tree-Adjoining Grammar (LTAG). The representation as an LTAG is ‘fully lexicalized’ in the sense that all grammatical information is directly encoded with the lexical item (as a set of elementary trees) and the combination operations are reduced from schema applications to the TAG primitives of adjunction and substitution. Given this LTAG, the generation task has a very different search space that can be traversed very efficiently, avoiding the costly on–line applications of HPSG unification. The entire generation task from a semantic representation to a surface string is split into two tasks, a microplanner and a syntactic realizer. This paper discusses the syntactic generator and the preprocessing steps as implemented in the Verbmobil system. 1 Generation in a Speech–to–Speech System The syntactic generation algorithm and the preprocessing steps presented in this paper are integrated into the Verbmobil system (see [Wahlster 1993, Bub, Wahlster, and Waibel 1997]). It is a system for speech–to–speech dialog translation. The input for the generation module VM–GECO is generated by a semantic–based transfer component (see [Dorna and Emele 1996]). The interface language chosen comprises the encoding of target language–specific semantic information in a combination of Underspecified Discourse Representation Theory and Minimal Recursion Semantics (see [Bos et al. 1996] and [Copestake, Flickinger, and Sag 1997]). The internal architecture of the generation module is modularized: it is separated into two phases, a microplanner and a syntactic generator. Throughout the system, we emphasize declarativity, which is also a necessary precondition for a comprehensive off–line preprocessing of external knowledge bases–in particular the preprocessing of the underlying Head–Driven Phrase Structure Grammar (HPSG, see [Pollard and Sag 1994]) which has been developed at CSLI, reflecting the latest developments in the linguistic theory and with a fairly wide coverage and also covering phenomena of spoken language. 1 VerbMobil GEneration COmponents 2 Microplanning and Syntactic Generation Starting from the semantic representation, the microplanning component generates an annotated dependency structure which is used by the syntactic generation component to realize a surface string. The microplanner also carries out word–choice. One goal of this modularization is a stepwise constraining of the search–space of alternative linguistic realizations, using different views in the different modules. In each step, only an abstraction of the multitude of information contained in an alternative needs to be considered. Another aspect of this architecture is the separation into a kernel system, i.e., the language independent core algorithms (a constraint-solver for microplanning and the search and combination algorithms for syntactic generation described in section 5) and declarative knowledge bases, e.g., the language specific word-choice constraints in microplanning and the TAG grammars used in syntactic realization. This separation allows for an easy adaptation of the system to other languages and domains (see [Becker et al. 1998]). 3 Declarativity in the Syntactic Generator All modules of the generator utilize external, declarative knowledge bases. For the syntactic generator, extensive off-line preprocessing of the highly declarative HPSG grammar for English is applied. The grammar has not even been written exclusively as a generation grammar. It is specialized, however, in that it covers phenomena of spoken language. The high level of abstraction which is achieved in the hierarchically organized grammar description (see [Flickinger 1987]) allows for easy maintenance as well as off-line preprocessing. The off-line preprocessing steps described in the next section keep the declarative nature of the grammar intact, i.e. they retain explicitly the phrase structures and syntactic features as defined by the HPSG grammar. In general, declarative knowledge bases allow for an easier adaptation of the system to other domains and languages. This is a huge benefit in the current second phase of the Verbmobil project [Becker et al. 1996] where the generator is extended to cover German, English and Japanese as well as additional and extended domains with a considerably larger vocabulary. 4 Off–Line Preprocessing: HPSG to TAG Compilation The subtasks in a direct syntactic generator based on an HPSG grammar will always include the application of schemata (the HPSG equivalent of phrase structure rules) such that all syntactic constraints introduced by a lexical item (especially its SUBCAT list) are fulfilled. This results in a constant repetition of, e.g., building up the projection of a verb in a declarative sentence. In preprocessing the HPSG grammar we aim at computing all possible partial phrase structures which can be derived from the information in a lexicon entry. Given such sets of possible syntactic realization together with a set of selected lexicon entries for an utterance and finally their dependencies, the task of a syntactic generator is simplified considerably. Instead of exploring all The HPSG grammar is being developed at CSLI, Stanford University. Development is carried out on a grammar development platform which is based on TDL [Krieger and Schäfer 1994]. In fact, most of the testing during grammar development depends on the use of a parser. possible, computationally expensive applications of HPSG schemata, it merely has to find suitable precomputed syntactic structures for each lexical item and combine them appropriately. For this preprocessing of the HPSG grammar, we adapted the ‘HPSG to TAG compilation’ process described in [Kasper et al. 1995]. The basis for the compilation is an identification of syntactically relevant selector features which express subcategorization requirements of a lexical item, e.g. the VALENCE features. In general, a phrase structure is complete when these selector features are empty. Starting from the feature structure for a lexical item, HPSG schemata are applied such that the current structure is unified with a daughter feature of the schema. The resulting structure is again subject to this process. This compilation process stops when certain termination criteria are met, e.g., when all selector features are empty. Thus, all projections from the lexical item are collected as a set of minimally complete phrase structures which can also be interpreted as elementary trees of a Tree–Adjoining Grammar (TAG). Instead of actually applying this compilation process to all lexical items, certain abstractions over the lexical entries are specified in the HPSG grammar. In fact, the needs of the compilation process have led to a clear–cut separation of lexical types and lexical entries as shown in Figure 1. A typical lexical entry is shown in Figure 2 and demonstrates that only three kinds of information are stored: the lexical type MV NP TRANS LE, the semantic contribution (the relation SUIT REL) and morphological information (the stem and potentially irregular forms). By expanding the lexical type, the full feature structure can be obtained. Schemata Phrase Structure Lexicon Hierarchy Lexical Instance Morphological Information Lexical Type Types Semantic Syntactic Types HPSG Principles
[1]
Wolfgang Wahlster,et al.
Verbmobil: the combination of deep and shallow processing for spontaneous speech translation
,
1997,
1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[2]
李幼升,et al.
Ph
,
1989
.
[3]
Johan Bos,et al.
Compositional Semantics in Verbmobil
,
1996,
COLING.
[4]
Peter Poller,et al.
An Efficient Kernel for Multilingual Generation in Speech-to-Speech Dialogue Translation
,
1998,
COLING-ACL.
[5]
Bernd Kiefer,et al.
Compilation of HPSG to TAG
,
1995,
ACL.
[6]
Ivan A. Sag,et al.
Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar
,
1996,
CL.
[7]
Wolfgang Wahlster,et al.
Verbmobil: Translation of Face-To-Face Dialogs
,
1993,
MTSUMMIT.
[8]
Hans-Ulrich Krieger,et al.
TDL-A Type Description Language for Constraint-Based Grammars
,
1994,
COLING.
[9]
Martin C. Emele,et al.
Semantic-based Transfer
,
1996,
COLING.
[10]
Aravind K. Joshi,et al.
Unification-Based Tree Adjoining Grammars
,
1991
.
[11]
Aravind K. Joshi,et al.
Parsing Strategies with ‘Lexicalized’ Grammars: Application to Tree Adjoining Grammars
,
1988,
COLING.
[12]
Matthew Stone,et al.
Sentence Planning as Description Using Tree Adjoining Grammar
,
1997,
ACL.
[13]
Kathleen F. McCoy,et al.
From functional specification to syntactic structures: systemic grammar and tree adjoining grammar
,
1991
.
[14]
Gertjan van Noord.
An overview of head-driven bottom-up generation
,
1990
.
[15]
James Pustejovsky,et al.
TAG’s as a Grammatical Formalism for Generation
,
1986,
HLT.
[16]
Dan Flickinger,et al.
Minimal Recursion Semantics: An Introduction
,
2005
.
[17]
Chris Mellish,et al.
Current research in natural language generation
,
1990
.
[18]
Gertjan van Noord,et al.
Semantic-Head-Driven Generation
,
1990,
Comput. Linguistics.
[19]
D. Flickinger.
Lexical rules in the hierarchical lexicon
,
1989
.
[20]
Aravind K. Joshi,et al.
An Introduction to Tree Adjoining Grammar
,
1987
.