Composite document extended retrieval: an overview

Experimental information retrieval (IR) systems, some dating back to the sixties, have demonstrated the viability of fully automatic document storage and retrieval methodologies with small to medium size bibliographic collections [72]. Many of these experimental systems utilize the vector space model in which each important term (such as a word stem) identifies a different dimension in a space, so that matrix methods and vector operations can be defined on queries and documents. Statistical techniques have been very effective, and probabilistic enhancements have given additional improvements [84]. However, the basic vector space model is oriented towards recording the essential information in the text of a title/abstract combination rather than describing more complex document structures. It is necessary to extend the model in order to handle composite documents. On the other hand, commonly available retrieval systems that employ Boolean logic queries and utilize inverted file storage schemes can without modification accommodate such documents, albeit with somewhat less effectiveness than is possible with more sophisticated systems. Hence, it is also of interest to consider how Boolean logic systems can be extended to give better performance, especially with composite documents, and to integrate those approaches with vector methods.

[1]  G. Salton,et al.  A Generalized Term Dependence Model in Information Retrieval , 1983 .

[2]  Joan M. Morrissey,et al.  An Intelligent Terminal for Implementing Relevance Feedback on Large Operational Retrieval Systems , 1982, SIGIR.

[3]  Norbert Fuhr,et al.  Retrieval Test Evaluation of a Rule Based Automatic Index (AIR/PHYS) , 1984, SIGIR.

[4]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[5]  Deyi Li A PROLOG database system , 1984 .

[6]  Alan C. Shaw,et al.  The structure of abstract document objects , 1984, COCS '84.

[7]  Frederick Hayes-Roth,et al.  Building expert systems , 1983, Advanced book program.

[8]  Robert T. Dattola FIRST: Flexible Information Retrieval System for Text , 1979, J. Am. Soc. Inf. Sci..

[9]  Robert G. Crawford The relational model in information retrieval , 1981, J. Am. Soc. Inf. Sci..

[10]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[11]  D. M. Joseph,et al.  Correction of misspellings and typographical errors in a free-text medical English information storage and retrieval system. , 1979, Methods of information in medicine.

[12]  Edward A. Fox,et al.  Research Contributions , 2014 .

[13]  Daniel G. Shapiro,et al.  A Rule-Based Approach to Information Retrieval: Some Results and Comments , 1983, AAAI.

[14]  Marvin H. Solomon,et al.  The CSNET Name Server , 1982, Comput. Networks.

[15]  Margaret Jennings The Electronic Manuscript Project. , 1984 .

[16]  C. Paice Soft evaluation of Boolean search queries in information retrieval systems , 1984 .

[17]  Fred J. Maryanski Office Information Systems , 1981, Computer.

[18]  Caroline M. Eastman File Searching Problems in Logic Programming Systems. , 1983 .

[19]  Eric Aiiman,et al.  Sendmail -- an internetwork mail router , 1986 .

[20]  J. E. White A user-friendly naming convention for use in communication networks , 1984 .

[21]  Abraham Bookstein,et al.  Fuzzy requests: An approach to weighted boolean searches , 1980, J. Am. Soc. Inf. Sci..

[22]  T. H. Myer Standards for global messaging: a progress report , 1983 .

[23]  Philip J. Hayes,et al.  Flexible Parsing , 1980, ACL.

[24]  John O'Connor,et al.  Answer-passage retrieval by text searching , 1980, J. Am. Soc. Inf. Sci..

[25]  F. Turini,et al.  A conceptual approach to document retrieval , 1984, COCS '84.

[26]  Marvin A. Sirbu,et al.  NAMING AND DIRECTORY ISSUES IN MESSAGE TRANSFER SYSTEMS , 1984 .

[27]  A F. Smeaton Relevance feedback and a fuzzy set of search terms in an information retrieval system , 1984 .

[28]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[29]  Terry Winograd,et al.  Language as a cognitive process 1: Syntax , 1982 .

[30]  Fernando Carlos Neves Pereira,et al.  Logic for natural language analysis , 1982 .

[31]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[32]  Cyril N. Alberga,et al.  String similarity and misspellings , 1967, CACM.

[33]  Chuck Rieger,et al.  Parsing and comprehending with word experts (a theory and its realization) , 1982 .

[34]  Edward Fox,et al.  Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types , 1983 .

[35]  W. Bruce Croft Experiments with automatic text filing and retrieval in the office environment , 1982, SIGF.

[36]  Giovanni Maria Sacco OTTER - An information retrieval system for office automation , 1984 .

[37]  Douglas Comer,et al.  The computer science research network CSNET: a history and status report , 1983, CACM.

[38]  Einar Nodtvedt Information Retrieval in the Business Environment , 1980 .

[39]  Roger M. Needham,et al.  Grapevine: an exercise in distributed computing , 1982, CACM.

[40]  W. Bruce Croft Applications for Information Retrieval Techniques in the Office , 1983, SIGIR.

[41]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[42]  Leon Davidson,et al.  Retrieval of misspelled names in an airlines passenger record system , 1962, CACM.

[43]  Michael Stonebraker,et al.  Document processing in a relational database system , 1983, TOIS.

[44]  Frederick H. Lochovsky,et al.  Officeaid: An integrated document management system , 1984 .

[45]  Stephen Robertson,et al.  An algorithm for weighted searching on a Boolean system , 1984 .

[46]  Edward A. Fox,et al.  Some Considerations for Implementing the SMART Information Retrieval System Under UNIX , 1983 .

[47]  Jaime G. Carbonell,et al.  Coping with Extragrammaticality , 1984, ACL.

[48]  Alfred Correira,et al.  Computing Story Trees , 1980, CL.

[49]  Anthony Ralston The proposed new Computing Reviews classification scheme , 1981, CACM.

[50]  Horst Biller,et al.  On the Architecture of a System Integrating Data Base Management and Information Retrieval , 1982, SIGIR.

[51]  Edward A. Fox,et al.  Implementing SMART for minicomputers via relational processing With abstract data types , 1981, SIGSMALL '81.

[52]  G. M. Sacco OTTER - An information retrieval system for office automation , 1984, COCS '84.

[53]  R. Greenfield An experiment to measure the performance of phonetic key compression retrieval schemes. , 1977, Methods of information in medicine.

[54]  Edward A. Fox,et al.  Characterization of Two New Experimental Collections in Computer and Information Science Containing Textual and Bibliographic Concepts , 1983 .

[55]  Hans-Peter Frei,et al.  Adapting a Data Organization to the Structure of Stored Information , 1982, SIGIR.

[56]  Robert A. Kowalski,et al.  Logic for problem solving , 1982, The computer science library : Artificial intelligence series.

[57]  National Bureau of Standards Specification for message format for Computer Based Message Systems , 1983, RFC.

[58]  W. Horak,et al.  An object-oriented Office Document Architecture model for processing and interchange of documents , 1984, COCS '84.

[59]  Eugene Charniak With a spoon in hand this must be the eating frame , 1978, TINLAP '78.

[60]  Julie Bichteler,et al.  The combined use of bibliographic coupling and cocitation for document retrieval , 1980, J. Am. Soc. Inf. Sci..

[61]  Christopher K. Riesbeck,et al.  Realistic Language Comprehension , 1982 .

[62]  Linda C. Smith,et al.  A taxonomy of representations in information retrieval system design , 1984 .

[63]  P. H. Vickers Common Problems of Documentary Information Transfer, Storage and Retrieval in Industrial Organizations , 1983, J. Documentation.

[64]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[65]  D. H. Crocker,et al.  Standard for the format of arpa intemet text messages , 1982 .

[66]  D. Rumelhart NOTES ON A SCHEMA FOR STORIES , 1975 .