Building the CODER Lexicon: The Collins English Dictionary and Its Adverb Definitions

The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of "composite documents." In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster''s Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a summary of adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.

[1]  Mallory Selfridge,et al.  Integrated Processing Produces Robust Understanding , 1986, Comput. Linguistics.

[2]  Michael C. McCord Semantic Interpretation for the Epistle System , 1984, ICLP.

[3]  Andrew David Grammatical Analysis by Computer of the Lancaster-Oslo/Bergen (LOB) Corpus of British English Texts , 1985, ACL 1985.

[4]  Martha E. Williams,et al.  Transparent information systems through gateways, front ends, intermediaries, and interfaces , 1986, J. Am. Soc. Inf. Sci..

[5]  Robert Wilensky,et al.  Artificial Intelligence and Language Processing Talking to Unix in English: an Overview of Uc , 2022 .

[6]  Kotagiri Ramamohanarao,et al.  A Superimposed Codeword Indexing Scheme for Very Large Prolog Databases , 1986, ICLP.

[7]  Elaine Svenonius,et al.  Unanswered questions in the design of controlled vocabularies , 1986, J. Am. Soc. Inf. Sci..

[8]  Edward A. Fox,et al.  Some Considerations for Implementing the SMART Information Retrieval System Under UNIX , 1983 .

[9]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[10]  John O'Connor,et al.  Answer-passage retrieval by text searching , 1980, J. Am. Soc. Inf. Sci..

[11]  M E Williams Electronic databases. , 1985, Science.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Bruno Defude Different levels of expertise for an expert system in information retrieval , 1985, SIGIR '85.

[14]  Martin Kay,et al.  Linguistics and information science , 1973 .

[15]  Edward Fox,et al.  Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types , 1983 .

[16]  Edward A. Fox,et al.  Architecture of an expert system for composite document analysis, representation, and retrieval , 1997, Int. J. Approx. Reason..

[17]  Vijay V. Raghavan,et al.  A critical analysis of vector space model for information retrieval , 1986, J. Am. Soc. Inf. Sci..

[18]  Alan Borning,et al.  A prototype electronic encyclopedia , 1985, TOIS.

[19]  Michael Lebowitz Using Memory in Text Understanding , 1984, ECAI.

[20]  John D. Gabbe,et al.  Transactional Blackboards , 1985, IJCAI.

[21]  Robert Alfred Amsler The Structure of the Merriam-Webster Pocket Dictionary , 1980 .

[22]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[23]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[24]  Edward A. Fox,et al.  Composite document extended retrieval: an overview , 1985, SIGIR '85.

[25]  Edward A. Fox,et al.  Creation of a Prolog Fact Base from the Collins English Dictionary , 1988 .

[26]  Christine A. Montgomery,et al.  Linguistics and information science , 1972, J. Am. Soc. Inf. Sci..

[27]  Naomi Sager,et al.  Natural language information processing , 1980 .

[28]  Larry Fujitani Laser optical disk: the coming revolution in on-line storage , 1984, CACM.

[29]  Victor R. Lesser,et al.  The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty , 1980, CSUR.

[30]  Robert A. Amsler,et al.  A Taxonomy for English Nouns and Verbs , 1981, ACL.

[31]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[32]  Raoul N. Smith,et al.  A Lexicon For A Computer Question Answering System , 1978, ACL Microfiche Series 1-83, Including Computational Linguistics.

[33]  Douglas B. Lenat,et al.  CYC: Using Common Sense Knowledge to Overcome Brittleness and Knowledge Acquisition Bottlenecks , 1986, AI Mag..

[34]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[35]  Charles T. Meadow,et al.  Designing an information retrieval interface based on user characteristics , 1985, SIGIR '85.

[36]  John L. Pfaltz,et al.  Partial-match retrieval using indexed descriptor files , 1980, CACM.

[37]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[38]  Martha W. Evens,et al.  Relational thesauri in information retrieval , 1985, J. Am. Soc. Inf. Sci..

[39]  Richard Fikes,et al.  The role of frame-based representation in reasoning , 1985, CACM.

[40]  Igor Mel’čuk Towards a Linguistic ‘Meaning⇔Text’ Model , 1973 .

[41]  Martin Kay The Dictionary Server , 1984, COLING.

[42]  Gaston H. Gonnet,et al.  Handbook Of Algorithms And Data Structures , 1984 .

[43]  P. Hanks,et al.  Collins dictionary of the English language , 1979 .

[44]  R. A. Amsler Machine-readable dictionaries , 1984 .

[45]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[46]  Edward A. Fox,et al.  An Artificial Intelligence Environment for Information Retrieval Research , 1988 .

[47]  Mark S. Fox,et al.  The Automated Dictionary , 1980, Computer.

[48]  Chuck Rieger,et al.  Toward a Theory of Distributed Word Expert Natural Language Parsing , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[49]  Michael Lebowitz,et al.  RESEARCHER: An Overview , 1983, AAAI.

[50]  L. A. Miller,et al.  Talking minds : the study of language in cognitive science , 1986 .

[51]  Gerard Salton,et al.  A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART) , 1972, J. Am. Soc. Inf. Sci..

[52]  Michael E. Lesk Information in data: using the Oxford English dictionary on a computer , 1986, SIGF.

[53]  Daniel G. Shapiro,et al.  A Rule-Based Approach to Information Retrieval: Some Results and Comments , 1983, AAAI.

[54]  Thomas Ahlswede A Tool Kit for Lexicon Building , 1985, ACL.

[55]  Stephen A. Weyer,et al.  The design of a dynamic book for information search , 1982 .

[56]  Nicholas J. Belkin,et al.  Simulation of a distributed expert-based information provision mechanism , 1984 .

[57]  Man-Kam Yip,et al.  An expert system for document retrieval , 1981 .

[58]  Robert F. Simmons,et al.  A text knowledge base from the AI handbook , 1983, Inf. Process. Manag..

[59]  Igor Mel’čuk,et al.  Semantics and Lexicography: Towards a New Type of Unilingual Dictionary , 1969 .

[60]  Penny J. Daniels,et al.  Cognitive Models in Information Retrieval - an Evaluative Review , 1986, J. Documentation.

[61]  Edward A. Fox,et al.  Lexical relations: enhancing effectiveness of information retrieval systems , 1980, SIGF.

[62]  Michael Lebowitz,et al.  Memory-Based Parsing , 1983, Artif. Intell..

[63]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[64]  Barbara Hayes-Roth BB1: an architecture for blackboard systems that control, explain, and learn about their own behavior , 1984 .

[65]  Martin Chodorow,et al.  Extracting Semantic Hierarchies from a Large On-Line Dictionary , 1985, ACL.

[66]  Frederick Hayes-Roth,et al.  Building expert systems , 1983, Advanced book program.

[67]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[68]  Robert A. Kowalski,et al.  Logic for problem solving , 1982, The computer science library : Artificial intelligence series.

[69]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[70]  Fernando Carlos Neves Pereira,et al.  Logic for natural language analysis , 1982 .

[71]  R. T. Niehoff Development of an Integrated Energy Vocabulary and the Possibilities for On-line Subject Switching , 1976, J. Am. Soc. Inf. Sci..

[72]  J. L. Peterson Webster''s Seventh New Collegiate Dictionary: a Computer-readable File Format , 1982 .

[73]  Michael R. Genesereth,et al.  Logic programming , 1985, CACM.

[74]  Yves Lespérance,et al.  Toward a computational interpretation of situation semantics 1 , 1986, Comput. Intell..

[75]  Lois L. Earl Use of word government in resolving syntactic and semantic ambiguities , 1973, Inf. Storage Retr..

[76]  Carl Pollard,et al.  A Computational Semantics for Natural Language , 1985, ACL.

[77]  Naomi Sager,et al.  Sublanguage grammers in science information processing , 1975, J. Am. Soc. Inf. Sci..

[78]  Ron Sacks-Davis,et al.  Performance of a multi-key access method based on descriptors and superimposed coding techniques , 1985, Inf. Syst..

[79]  Geoffrey K. Pullum,et al.  Computationally Relevant Properties of Natural Languages and Their Grammars , 1985 .

[80]  George A. Miller,et al.  Dictionaries of the Mind , 1985, ACL.

[81]  Donald Sherman,et al.  A new computer format for Webster's seventh collegiate dictionary , 1974 .

[82]  Frederick Hayes-Roth,et al.  Rule-based systems , 1985, CACM.

[83]  Christopher K. Riesbeck,et al.  Realistic Language Comprehension , 1982 .