The analysis and acquisition of proper names for robust text understanding

In this thesis we consider the problems that Proper Names cause in the analysis of unedited, naturally-occurring text. Proper Names cause problems because of their high frequency in many types of text, their poor coverage in conventional dictionaries, their importance in the text understanding process, and the complexity of their structure and the structure of the text which describes them. For the most part these problems have been ignored in the field of Natural Language Processing, with the result that Proper Names are one of its most under-researched areas. As a solution to the problem, we present a detailed description of the syntax and semantics of seven major classes of Proper Name, and of their surrounding context. This description leads to the construction of syntactic and semantic rules specifically for the analysis of Proper Names, which capitalise on the wealth of descriptive material which often accompanies a Proper Name when it occurs in a text. Such an approach side-steps the problem of lexical coverage, by allowing a text processing system to use the very text it is analysing to construct lexical and knowledge base entries for unknown Proper Names as it encounters them. The information acquired on unknown Proper Names goes considerably beyond a simple syntactic and semantic classification, instead consisting of a detailed genus and differentia description. A complete solution to the 'Proper Name Problem' must include approaches to the handling of apposition, conjunction and ellipsis, abbreviated reference, and many of the far from standard phenomena encountered in naturally-occurring text. The thesis advances partial and practical solutions in all of these areas. In order to set the work described in a suitable context, the problems of Proper Names are viewed as a subset of the general problem of lexical inadequacy, as it arises in processing real, un-edited, text. The whole of this field is reviewed, and various methods of lexical acquisition compared and evaluated. Our approach to coping with lexical inadequacy and to handling Proper Names is implemented in a news text understanding system called FUNES, which is able to automatically acquire detailed genus and differentia information on Proper Names as it encounters them in its processing of news text. We present an assessment of the system's performance on a sample of unseen news text which is held to support the validity of our approach to handling Proper Names.

[1]  Jaime G. Carbonell,et al.  Towards a Self-Extending Parser , 1979, ACL.

[2]  David M. Carter,et al.  Lexical Acquisition in the Core Language Engine , 1989, EACL.

[3]  H. Alshawi,et al.  Analysing the dictionary definitions , 1989 .

[4]  R. Shprintzen,et al.  What's in a name? , 1990, The Cleft palate journal.

[5]  Branimir Boguraev,et al.  Models for Lexical Knowledge Bases , 1993 .

[6]  Rajeev Agarwal,et al.  A Simple but Useful Approach to Conjunct Identification , 1992, ACL.

[7]  Jaime G. Carbonell,et al.  Robust Parsing Using Multiple Construction-Specific Strategies , 1987 .

[8]  Roy J. Byrd Word Formation in Natural Language Processing Systems , 1983, IJCAI.

[9]  Frank DeRemer,et al.  Lexical Analysis , 1976, Handbook of Natural Language Processing.

[10]  Branimir Konstantinov Boguraev,et al.  Automatic resolution of linguistic ambiguities , 1979 .

[11]  Robert C. Berwick,et al.  Learning Word Meanings From Examples , 1983, IJCAI.

[12]  Richard M. Schwartz,et al.  Towards Understanding Text with a Very Large Vocabulary , 1990, HLT.

[13]  John R. Anderson,et al.  A Theory of Language Acquisition Based on General Learning Principles , 1981, IJCAI.

[14]  M. Felisa Verdejo,et al.  Automatic extraction of factual information from text and its integration in a knowledge base , 1991, RIAO.

[15]  Alan Burton A sublanguage of English for database query in a managerial environment , 1991 .

[16]  Eugene Charniak,et al.  Passing Markers: A Theory of Contextual Influence in Language Comprehension , 1983, Cogn. Sci..

[17]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[18]  Hiyan Alshawi,et al.  Memory and context for language interpretation , 1987 .

[19]  Jorge Luis Borges,et al.  Funes the Memorious , 1942 .

[20]  D. J. Allerton,et al.  The linguistic and sociolinguistic status of proper names What are they, and who do they belong to? , 1987 .

[21]  Nancy Chinchor MUC-3 linguistic phenomena test experiment , 1991, MUC.

[22]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[23]  Eugene Charniak,et al.  A Neat Theory of Marker Passing , 1986, AAAI.

[24]  Robert C. Berwick,et al.  The acquisition of syntactic knowledge , 1985 .

[25]  David Allport,et al.  The TIC: Parsing Interesting Text , 1988, ANLP.