Recent Studies in Automatic Text Analysis and Document Retrieval

Many experts in mechanized text processing now agree that useful automatic language analysis procedures are largely unavailable and that the existing linguistic methodologies generally produce disappointing results. An attempt is made in the present study to identify those automatic procedures which appear most effective as a replacement for the missing language analysis. A series of computer experiments is described, designed to simulate a conventional document retrieval environment. It is found that a simple duplication, by automatic means, of the standard, manual document indexing and retrieval operations will not produce acceptable output results. New mechanized approaches to document handling are proposed, including document ranking methods, automatic dictionary and word list generation, and user feedback searches. It is shown that the fully automatic methodology is superior in effectiveness to the conventional procedures in normal use.

[1]  Don R. Swanson,et al.  Searching Natural Language Text by Computer , 1960 .

[2]  Journal of the Association for Computing Machinery , 1961, Nature.

[3]  Harold Borko,et al.  The construction of an empirically based mathematically derived classification system , 1899, AIEE-IRE '62 (Spring).

[4]  Don R. Swanson Interrogating a Computer in Natural Language , 1962, IFIP Congress.

[5]  Mary Elizabeth Stevens,et al.  Automatic indexing : a state-of-the art report , 1965 .

[6]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[7]  STUDY AND TEST OF A METHODOLOGY FOR LABORATORY EVALUATION OF MESSAGE RETRIEVAL SYSTEMS. INTERIM REPORT. , 1966 .

[8]  L. B. Doyle BREAKING THE COST BARRIER IN AUTOMATIC CLASSIFICATION , 1966 .

[9]  P. E. Jones,et al.  Study and test of a methodology for laboratory evaluation of message retrieval systems. ESD-TR-66-405. , 1966, Technical documentary report. United States. Air Force. Systems Command. Electronic Systems Division.

[10]  Mary Elizabeth Stevens,et al.  Statistical Association Methods for Mechanized Documentation. , 1967 .

[11]  Calvin C. Gotlieb,et al.  Semantic Clustering of Index Terms , 1968, J. ACM.

[12]  Gerard Salton,et al.  Search and retrieval experiments in real-time information retrieval , 1968, IFIP Congress.

[13]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[14]  Gerard Salton,et al.  A Comparison Between Manual and Automatic Indexing Methods , 1968 .

[15]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[16]  A. W. Elias Editorial looking at AD–1968 , 1969 .

[17]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[18]  Gerard Salton,et al.  Automatic Processing of Foreign Language Documents , 1969, COLING.

[19]  Estelle Brodman,et al.  Evaluation of the MEDLARS Demand Search Service , 1969 .

[20]  Jack Minker,et al.  An Analysis of Some Graph Theoretical Cluster Techniques , 1970, JACM.

[21]  Fred J. Damerau Automatic parsing for content analysis , 1970, CACM.

[22]  Gerard Salton,et al.  The Performance of Interactive Information Retrieval , 1971, Inf. Process. Lett..

[23]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[24]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[25]  Arnold W. Pratt,et al.  The function of semantics in automated language processing , 1971, SIGIR '71.

[26]  James E. Rush,et al.  Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria , 1971 .

[27]  Gerard Salton,et al.  Experiments in Automatic Thesaurus Construction for Information Retrieval , 1971, IFIP Congress.

[28]  Gerard Salton,et al.  A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART) , 1972, J. Am. Soc. Inf. Sci..