The Generalized Upper Model Knowledge Base: Organization and Use

In this paper we discuss some issues in designing and re-using an abstract ontology for domain modelling. We take our Generalized Upper Model Knowledge Base (GUM)|an ontology being developed primarily for Natural Language Processing applications|as starting point. The GUM knowledge base has been used in several contexts, including multilingual generation projects and information retrieval projects, supporting di erent knowledge domains. The motivations for this `linguistically motivated' ontology are proving themselves to o er re-usability across domains, tasks, and languages as well as the possibility of large scaling-up. We describe the general principles underlying the organization of the knowledge base, some steps towards extending its use, and some examples of the content thus motivated. 1 MOTIVATIONS AND OVERVIEW Experience with constructing general natural language generation and analysis components has demonstrated that interfacing such components with application systems or users is substantially simpli ed by the provision of general organizations of information that are linguistically motivated (cf. Penman [36], xtra [1], lilog [24], Alfresco [40] and many others| see [3] for a comprehensive overview of positions. Moreover, in order to be e ective, it is argued in [2] that such general organizations of information must aim to achieve two potentially con icting goals. On the one hand, the Magnini is partially supported by the European Union LRE (Language Research and Engineering) projects gist and transterm. Fabris is supported by a grant from the Comune di Trento. Bateman is also a member of the Penman Project, USC/Information Sciences Institute, Marina del Rey, Los Angeles. In addition to the authors of the present paper, the current stage of development of the Generalized Upper Model has been signi cantly shaped by Renate Henschel and Fabio Rinaldi. organization must achieve a su cient level of abstraction in the semantic types employed as to escape the idiosyncracies of surface realization and ease interfacing with (possibly non-linguistically oriented) domain knowledge. While on the other, the organization must still maintain a su ciently close relationship to surface regularities as to permit operationalisation and interfacing with natural language surface components. When the link with surface realization is broken, our experiences shows that the modelling-style becomes under-constrained and re-usability su ers. Our starting point is the ongoing development work on the design and use of upper models as originally proposed in the USC/ISI-BBN Janus collaboration [29, 30, 33]. An upper model is an abstract linguistically motivated ontology meeting both requirements stated above. In [6], we introduced our current work in which we are pursuing the development of an upper model that is both su cient for natural language processing needs and re-usable and shareable across di erent languages as well as across different domains/tasks. This Generalized Upper Model provides semantic distinctions appropriate and adequate for supporting natural language processing for (at least) Italian, German and English. Hence, the more speci c designation of the ontology described: the GeneralizedEnglish, German, Italian Upper Model. The design philosophy of this Generalized Upper Model is that linguistically motivated concepts and concept organizations are provided which are as far as possible valid across distinct languages. However, there is no theoretical requirement that all concepts will be relevant for all languages. In this respect our approach di ers from standard conceptions of an `interlingua'. Indeed, it is to be expected that languages di er in the semantic organizations they require. However, it is also expected that the level of abstraction of the Generalized Upper Model is su cient for substantial sharing and re-use across languages precisely because details of surface form have been left behind. The network as a whole, although multilingual in orientation, therefore makes no assumptions of universality. Rather the reverse is the case; that is, we assume that there will be di erences between languages in the kinds of experiential semantic distinctions that they draw. To the extent, however, that di erent grammatical systems need to perform similar communicative functional tasks, they will induce similarities in the semantic organization. The motivation for this position is given in detail in [8, 7, 32, 9]. Our expectations have been strongly supported by our work so far, where extensions and alterations made on the basis of linguistic evidence from some particular language have most often proved equally applicable to the other languages covered. Examples of this for German and English have already been presented in [19]; additional discussion for Italian was presented in [6]. The general organization of information thus created o ers many advantages for knowledge representation. The distinctions drawn tend to be ner and more broadly motivated than distinctions based on non-linguistic, or task or domain speci c knowledge. Moreover, there is the additional functionality that whenever knowledge is organized in the manner described by the Upper Model, its expression in natural language is signi cantly simpli ed. This more ne grained set of categories brings its own problems however: it requires more e ort for a knowledge engineer to use and more information in order to use it correctly. In this paper, our aim is to show some of the steps towards operationalization of the Generalized Upper Model that can be taken and to provide some examples of the kind of modelling it enables. The paper begins with an outline of some of the criteria that have been established for including concepts and discriminations in such ontologies and their use for knowledge representation. We go on to present a detailed application of these criteria in one area of the conceptual hierarchy: that of communication processes. Finally, we brie y overview projects making use of versions of the Upper Model and describe ongoing work. 2 CRITERIA FOR BUILDING THE

[1]  John A. Bateman,et al.  The Merged Upper Model: A Linguistic Ontology for German and English , 1994, COLING.

[2]  Gregor Erbach Multi-Dimensional Inheritance , 1994, ArXiv.

[3]  John A. Bateman,et al.  The Re-use of Linguistic Resources across Languages in Multilingual Generation Components , 1991, IJCAI.

[4]  Sergei Nirenburg,et al.  Approximating an Interlingua in a Principled Way , 1992, HLT.

[5]  John A. Bateman Upper Modeling: organizing knowledge for natural language processing , 1990, INLG.

[6]  Robert M. MacGregor,et al.  The Loom Knowledge Representation Language. , 1987 .

[7]  John A. Bateman,et al.  The Theoretical Status of Ontologies in Natural Language Processing , 1997, ArXiv.

[8]  Oliviero Stock,et al.  Natural Language and Exploration of an Information Space: The ALFresco Interactive System , 1991, IJCAI.

[9]  Carlo Strapparava,et al.  An Approach To Multilevel Semantics For Applied Systems , 1992, ANLP.

[10]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[11]  C. Matthiessen Lexicogrammatical cartography : English systems , 1995 .

[12]  Norbert Reithinger,et al.  XTRA: A Natural-Language Access System to Expert Systems , 1989, Int. J. Man Mach. Stud..

[13]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[14]  Christian M. I. M. Matthiessen,et al.  Multilingual generation: Dimensions of organization and forms of representation , 1992 .

[15]  Manfred Stede,et al.  Generating Multilingual Documents from a Knowledge Base: The TECHDOC Project , 1994, COLING.

[16]  Ewald Lang The LILOG Ontology from a Linguistic Point of View , 1991, Text Understanding in LILOG.