Panel on Natural Language and Databases

While I disagree wi th the proposi t ion t h a t da t abase query has out l ived its usefulness as a tes t env i ronment for na tu r a l language processing (for reasons t h a t I give below), I believe there are o ther reasonable tasks which can also spur new research in NL processing. In par t icular , I will suggest t h a t the task of providing a na tu ra l language interface to a rich p rog ramming env i ronment offers a convenient yet chal lenging extension of work already being done wi th d a t a b a s e query. Fi rs t I recite some of the meri ts of cont inuing research on na tu r a l language wi th in the confines of cons t ruc t ing an interface for ord inary da tabases . One advantage is t ha t the speed of processing is not of overwhelming impor tance in this appl icat ion, since one who requests informat ion from a da t abase can expect the retrieval to take t ime, wi th or w i thou t a na tu ra l language interface. Of course speed is desirable, and wait ing for answers to apparent ly simple requests will be i r r i ta t ing, bu t some delay will be tolerable. This tolerance on the par t of the user will, I suggest, disappea r in appl icat ions where an a t t e m p t is made to engage a sys tem in dialogue wi th the user, as would be the case in some exper t systems, or in teaching systems. Assuming t h a t na t u r a l language systems will not by themselves get faster as they are made to cope with larger f ragments of a na tu ra l language, it will be useful to cont inue with d a t a b a s e query while we wait for miracles of technology to fill our demands for ever greater processing speed. A second reason for not yet abandon ing the da tabase query as a tes t env i ronment is t ha t a great deal of impor t an t na tu r a l language processing research remains to be done in general izing systems to cope with more t han one na tu ra l language. Work on language universals gives reason to believe t h a t some significant pa r t of a na tu ra l language sys tem for English should be recyclable in cons t ruc t ing a sys tem for some o ther language. How much these cross-linguistic concerns ought to affect the cons t ruc t ion of a par t icular sys tem is itself one of the quest ions deserving of a t t ent ion, bu t our experience to date suggests t h a t it pays to avoid language-par t icu lar solutions in an implementa t ion which aspires to t r e a tmen t of any sizable f ragment of a language, even a single language like English. The degree of languageindependence tha t a na tu ra l language sys tem can boas t may also prove to be one useful metric for evaluating and compar ing such systems. [t seems clear t ha t even the task of answering da tabase queries will provide a more t h a n adequa te supply of linguistically interest ing problems for this line of research. Finally, it has s imply not been our exper ience at Hewle t t -Packard t h a t there is any shor tage of theoret ical ly in teres t ing problems to solve in cons t ruc t ing a na tu ra l language interface for da tabases . For example, in bui lding such an interface, we have recent ly designed and implemented a hierarchical ly s t ruc tu red lexicon for a f ragment of English, t oge the r wi th a set of lexical rules t h a t can be run e i ther when loading the system, or when parsing, to great ly exp a n d the size of the lexicon actual ly used in the system. Several quest ions of theore t ica l interest t h a t arose in t h a t process remain unanswered; at least some can be answered by exper imen t ing wi th our present sys tem, funct ioning simply as an interface to an ord inary relat ional da tabase . Having argued t h a t significant work remains to be done in n a t u r a l language processing as an interface to da tabases , I nonethe less believe t ha t it would be fruitful to expand the scope of a na tu ra l language interface, to permi t some man ipu l a t i on of a p rog ramming env i ronment , allowing not only the retr ieval of informat ion describing the s ta te of the sys tem, bu t also some modif icat ion of the sys tem via na t ural language. Of course, such a task would be faci l i tated by having the informat ion abou t the env i ronmen t s tored in a m a n n e r s imilar to t h a t of a da tabase , so t h a t our a t t ent ion could be devoted to the new range of linguistic issues raised, r a t h e r t han to details of how the whole programming env i ronmen t is s t ruc tu red and main ta ined . I will not offer an account of how such a merging of da t abase and general p rog ramming env i ronment might be accomplished, bu t ins tead will offer some mot iva t ion for s t re tch ing na tu ra l language research in this direction. It seems clear, first of all, t h a t such an interface would be useful, given t h a t even a common p rogramming envir onmen t provides a wide array of tools, not all of which are familiar to any one user. While it is usually the case t ha t one who is accus tomed to a given facility would be h a m p e r e d by having to employ only a na tu ra l language interface to accomplish familiar tasks (e.g., imagine typing "Move down to the beginning of the next line" every t ime a ca r r i agere tu rn was required) , such an interface would be invaluable when t ry ing to utilize an unfamil iar par t of the system. A re la ted benefit would be the ability of a user new to a p rogramming env i ronment to customize t h a t env i ronmen t w i thou t any detai led knowledge of it. This indirect access to the mul t i tude of pa ramete r s t h a t de te rmine the behav ior of a complex env i ronment would also be convenient for an exper ienced user a t t e m p t i n g to a l ter some