论文信息 - Word manager: A system for morphological dictionaries

Word manager: A system for morphological dictionaries

This book describes Word Manager (WM), a lexical database system intended to serve a wide range of NLP applications. The most striking aspect of WM itself, and of this presentation, is its windowed user interface. The book consists of five chapters ('Introduction,' 'The Word Manager Approach,' 'Inflection,' 'Wordformation' [sic], and 'Linguistic Theories') and a lengthy appendix giving syntax definitions for the user language. A WM linguistic description consists of a tree-shaped hierarchy whose nodes contain declarations, feature specifications, and rules and entries of various kinds. Each node is identified by a feature set and associated with a window, the label of the node being inherited by those below it in the tree. Affixation in its simplest form is indicated by juxtaposing feature specifications that identify the segments concerned. More complex cases requiring spelling adjustments at segment boundaries are treated by means of 'match and map' rules. These are regular expression pattern matchers with bindings and substitutions, and may be paired with feature sets that govern their application. WM spelling rules differ from the more familiar two-level variety in being ordered. So, for example, the plural form amici of the Italian amico 'friend' can be produced by adding h to the stem (as would be done for the normal case baco, bachi 'silkworm') and then removing it again. The authors adopt without comment extrinsic rule ordering and radical non-monotonicity when elsewhere the tendency, shared by many theoretical linguists, has for some time been to abandon them. One reason for choosing a hierarchical organization is that redundancy can be minimized through the use of inheritance. However, several aspects of WM give rise to unnecessarily redundant specifications. Patterns in spelling rules must apparently subsume the entire string against which they match; this leads to the presence of repeated subexpressions whose only function is to skip irrelevant characters. What has been missed here is the fact that the phenomena these rules are intended to handle are essentially boundary effects. The pattern-matching component of the rules shows no sign of having been designed for, or even adapted to, the purpose for which it is employed. Similarly, it seems necessary to specify for each word segment mentioned in a rule not only its lexical form but also all of its surface forms, in addition to supplying spelling rules that implicitly express the same correspondence. The intention is to permit cross-checking during compilation, but the tracing facilities offered by WM should make this unnecessary. There is a general impression of piecemeal design, almost as if the shell of WM had been developed without regard to linguistic considerations and then fleshed out with rules and entries at the last moment when it was too late to change anything. But the real weak point of this book lies less in WM itself than in the presentation. Typically, this proceeds by describing some morphological phenomenon, and comparing two or three possible analyses. The emphasis is entirely on examples and the syntax of the system; nowhere do we find a clear statement of how the syntax is to be interpreted, an account of the formal properties of the various mechanisms employed, or proper motivation for the choice of these mechanisms rather than others. There are some interesting ideas hidden below the surface (morphological rules can be specialized to handle exceptions, feature values are used to encode paths through the hierarchy), but it is hard to evaluate them in this form. The first chapter sets out to provide justification for WM and draws comparisons with other approaches to morphology and lexical organization. The authors' awareness of such work seems quite rudimentary: the three contrasted approaches are finite-state lexicons with two-level rules (Koskenniemi 1983), DATR (Evans and Gazdar 1990), and the Celex databases. No mention is made of, for example, the lexical knowledge base created for the Acquilex project (Copestake [1992] is a recent report, but the project has been well publicized for much longer), or the large amount of work that has been done by Bear (1988), Trost (1991), and oth-

Pius ten Hacken | Marc Domenig

[1] Ann A. Copestake,et al. The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons , 1992, ANLP.

[2] Ronald M. Kaplan,et al. Restriction and Correspondence-based Translation , 1993, EACL.

[3] Paul Rayson,et al. Automatic Content Analysis of Spoken Discourse , 1992 .