An approach to natural language for document retrieval

Document retrieval systems have been restricted, by the nature of the task, to techniques that can be used with large numbers of documents and broad domains. The most effective techniques that have been developed are based on the statistics of word occurrences in text. In this paper, we describe an approach to using natural language processing (NLP) techniques for what is essentially a natural language problem - the comparison of a request text with the text of document titles and abstracts. The proposed NLP techniques are used to develop a request model based on “conceptual case frames” and to compare this model with the texts of candidate documents. The request model is also used to provide information to statistical search techniques that identify the candidate documents. As part of a preliminary evaluation of this approach, case frame representations of a set of requests from the CACM collection were constructed. Statistical searches carried out using dependency and relative importance information derived from the request models indicate that performance benefits can be obtained.

[1]  W. Bruce Croft Boolean Queries and Term Dependencies in Probabilistic Retrieval Models. , 1986 .

[2]  Jean R. Harber,et al.  Natural Language Processing: A Knowledge-Engineering Approach , 1987, IEEE Expert.

[3]  W. Bruce Croft User-specified domain knowledge for document retrieval , 1986, SIGIR '86.

[4]  Martin Dillon,et al.  FASIT: A fully automatic syntactically based indexing system , 1983, J. Am. Soc. Inf. Sci..

[5]  Lawrence Birnbaum,et al.  Conceptual analysis of natural language , 1981 .

[6]  Harold Borko,et al.  Automatic indexing , 1981, ACM '81.

[7]  W. Bruce Croft,et al.  I3R: A new approach to the design of document retrieval systems , 1987, J. Am. Soc. Inf. Sci..

[8]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[9]  Alan F. Smeaton,et al.  Incorporating syntactic information into a document retrieval strategy: an investigation , 1986, SIGIR '86.

[10]  Ted Briscoe,et al.  Towards A Dictionary Support Environment For Realtime Parsing , 1985, EACL.

[11]  Ii Gerald Francis Dejong Skimming stories in real time: an experiment in integrated understanding. , 1979 .

[12]  Gerard Salton,et al.  Automatic indexing , 1980, ACM '80.

[13]  W. Bruce Croft Document representation in probabilistic models of information retrieval , 1981, J. Am. Soc. Inf. Sci..

[14]  Aslib,et al.  The journal of documentation , 1945 .

[15]  R. Schank,et al.  Inside Computer Understanding: Five Programs Plus Miniatures , 1982 .

[16]  Bertram C. Bruce Case Systems for Natural Language , 1975, Artif. Intell..

[17]  W. Bruce Croft Boolean queries and term dependencies in probabilistic retrieval models , 1986, J. Am. Soc. Inf. Sci..

[18]  Edward A. Fox,et al.  Research Contributions , 2014 .

[19]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[20]  William A. Woods,et al.  Computational Linguistics Transition Network Grammars for Natural Language Analysis , 2022 .

[21]  Gregor Thurmair A common architecture for different text processing techniques in an information retrieval environment , 1986, SIGIR '86.

[22]  Joseph D. Becker The Phrasal Lexicon , 1975, TINLAP.