Searching for Text Documents

Many documents contain, besides text, also images, tables, and so on. This chapter concentrates on the text part only. Traditionally, systems handling text documents are called information storage and retrieval systems. Before the World-Wide Web emerged, such systems were almost exclusively used by professional users, so-called indexers and searchers, e.g., for medical research, in libraries, by governmental organizations and archives. Typically, professional users act as “search intermediaries” for end users. They try to fig out in an interactive dialogue with the system and the end user what it is the end user needs, and how this information should be used in a successful search. Professionals know the collection, they know how documents in the collection are represented in the system, and they know how to use Boolean search operators to control the number of retrieved documents.

[1]  Edward A. Fox,et al.  Research Contributions , 2014 .

[2]  Gerhard Weikum,et al.  Intelligent Search on XML Data , 2003, Lecture Notes in Computer Science.

[3]  W. Bruce Croft,et al.  Computationally tractable probabilistic modeling of Boolean operators , 1997, SIGIR '97.

[4]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[6]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[7]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[8]  Berthier A. Ribeiro-Neto,et al.  A belief network model for IR , 1996, SIGIR '96.

[9]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[10]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[11]  Judea Pearl,et al.  Chapter 2 – BAYESIAN INFERENCE , 1988 .

[12]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[13]  Kenney Ng A Maximum Likelihood Ratio Information Retrieval Model , 1999, TREC.

[14]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[15]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[16]  Yiyu Yao,et al.  On modeling information retrieval with probabilistic inference , 1995, TOIS.

[17]  Norbert Fuhr,et al.  Probabilistic Datalog—a logic for powerful retrieval methods , 1995, SIGIR '95.

[18]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[19]  Gerald Kowalski,et al.  Information Retrieval Systems: Theory and Implementation , 1997 .

[20]  W. Bruce Croft,et al.  A Comparison of Text Retrieval Models , 1992, Comput. J..

[21]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[22]  Gobinda G. Chowdhury,et al.  Introduction to Modern Information Retrieval , 1999 .

[23]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[24]  David E. Losada,et al.  Using a belief revision operator for document ranking in extended Boolean models , 1999, SIGIR '99.

[25]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[26]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[27]  David Heckerman,et al.  Probabilistic similarity networks , 1991, Networks.

[28]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[29]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[30]  John Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR 1999.

[31]  Djoerd Hiemstra,et al.  Relating the new language models of information retrieval to the traditional retrieval models , 2000 .

[32]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[33]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[34]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[35]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[36]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[37]  Fabrizio Sebastiani,et al.  A probabilistic terminological logic for modelling information retrieval , 1994, SIGIR '94.

[38]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[39]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[40]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[41]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..