Automatic text structuring and retrieval-experiments in automatic encyclopedia searching

Many conventional approaches to text analysis and information retrieval prove ineffective when large text collections must be processed in heterogeneous subject areas. An alternative text manipulation system is outlined useful for the retrieval of large heterogeneous texts, and for the recognition of content similarities between text excerpts, based on flexible text matching procedures carried out in several contexts of different scope. The methods are illustrated by search experiments performed with the 29-volume Funk and Wagnalls encyclopedia. Approaches to Text Retrieval We are addressing the problem of text retrieval from large, heterogeneous text databases, where the vocabulary varies widely and the subject matter is unrestricted. This includes newspaper articles, newswire dispatches, textbooks, dictionaries and encyclopedias, manuals, magazine articles, and so on. Necessarily, the retrieval operations must be preceded by some form of text analysis, or text indexing operation, leading to the assignment to query and document texts of appropriate content identifications. When content identifiers are attached to text items, the text retrieval operations can be carried out by comparing the identifications assigned to the various text items with the query identifications, and retrieving texts that appear sufficiently similar to the corresponding queries. [1-2] ● Department of Computer Science, Cornell University, Ithaca, NY 14853-7501. This study was supported in part by the National Science Foundation under grant IRI 89-15847. Permission to copy without fee all or part of this material ia granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice ie given that copying is by permieaion of the Association for Computing Machinery. To copy otherwiee, or to republish, requiree a fee and/or specific permission. @ 1991 ACM 0.89791.448-1/91/0009/0021 ,.,$1.50 In discussing the text analysis operation, it is customary to contrast the so-called keyword approach with more sophisticated procedures based in part on linguistic and logical considerations. In the keyword approach, single-term concepts, known as keywords,

[1]  E. Frisse Mark,et al.  Searching for information in a hypertext medical handbook , 1988 .

[2]  R. A. Amsler Machine-readable dictionaries , 1984 .

[3]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[4]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[5]  Gerard Salton,et al.  Flexible Text Matching for Information Retrieval , 1990 .

[6]  D. C. Blair,et al.  Language and Representation in Information Retrieval , 1990 .

[7]  Douglas B. Lenat,et al.  CYC: Using Common Sense Knowledge to Overcome Brittleness and Knowledge Acquisition Bottlenecks , 1986, AI Mag..

[8]  Paul R. Cohen,et al.  Information retrieval by constrained spreading activation in semantic networks , 1987, Inf. Process. Manag..

[9]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[10]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[11]  Jeff Conklin,et al.  Hypertext: An Introduction and Survey , 1987, Computer.

[12]  Umberto Eco,et al.  A theory of semiotics , 1976, Advances in semiotics.

[13]  Gerard Salton,et al.  On the application of syntactic methodologies in automatic text analysis , 1990, Inf. Process. Manag..

[14]  Brian C. O'Connor,et al.  Language and representation in information retrieval , 1993 .

[15]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[16]  Joel L. Fagan,et al.  The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval , 1989, JASIS.

[17]  H. R. Quillian In semantic information processing , 1968 .

[18]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[19]  Nicholas V. Findler,et al.  Associative Networks- Representation and Use of Knowledge by Computers , 1980, CL.

[20]  Joel L. Fagan The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval , 1989 .