Taming the Terabytes: A Human-Centered Approach to Surviving the Information Deluge

A fear of imminent information overload predates the World Wide Web by decades. Yet, that fear has never abated. Worse, as the World Wide Web today takes the lion’s share of the information we deal with, both in amount and in time spent gathering it, the situation has only become more precarious. This chapter analyses new issues in information overload that have emerged with the advent of the Web, which emphasizes written communication, defined in this context as the exchange of ideas expressed informally, often casually, as in verbal language. The chapter focuses on three ways to mitigate these issues. First, it helps us, the users, to be more specific in what we ask for. Second, it helps us amend our request when we don't get what we think we asked for. And third, since only we, the human users, can judge whether the information received is what we want, it makes retrieval techniques more effective by basing them on how humans structure information. This chapter reports on extensive experiments we conducted in all three areas. First, to let users be more specific in describing an information need, they were allowed to express themselves in an unrestricted conversational style. This way, they could convey their information need as if they were talking to a fellow human instead of using the two or three words typically supplied to a search engine. Second, users were provided with effective ways to zoom in on the desired information once potentially relevant information became available. Third, a variety of experiments focused on the search engine itself as the mediator between request and delivery of information. All examples that are explained in detail have actually been implemented. The results of our experiments demonstrate how a human-centered approach can reduce information overload in an area that grows in importance with each day that passes. By actually having built these applications, I present an operational, not just aspirational approach.

[1]  B. Tversky,et al.  Objects, parts, and categories. , 1984 .

[2]  W. Bruce Croft,et al.  A general language model for information retrieval (poster abstract) , 1999, SIGIR '99.

[3]  Qiang Huang,et al.  An Effective Approach to Verbose Queries Using a Limited Dependencies Language Model , 2009, ICTIR.

[4]  Anatole Gershman,et al.  Conceptual Analysis of Noun Groups in English , 1977, IJCAI.

[5]  Louis Vuurpijl,et al.  Using Pen-Based Outlines for Object-Based Annotation and Image-Based Queries , 1999, VISUAL.

[6]  W. Bruce Croft Effective Text Retrieval Based on Combining Evidence from the Corpus and Users , 1995, IEEE Expert.

[7]  Michael Stonebraker,et al.  Saying good-bye to DBMSs, designing effective interfaces , 2009, CACM.

[8]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[9]  Dawei Song,et al.  The document as an ergodic markov chain , 2004, SIGIR '04.

[10]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[11]  M. J. Peterson,et al.  Visual detection and visual imagery. , 1974, Journal of Experimental Psychology.

[12]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[13]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[14]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[15]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[16]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[18]  R. D. Fierro,et al.  Low-Rank Orthogonal Decompositions for Information Retrieval Applications , 1995 .

[19]  Eduard Hoenkamp,et al.  Live visual relevance feedback for query formulation , 2005, SIGIR '05.

[20]  Edward E. Smith,et al.  Basic-level superiority in picture categorization , 1982 .

[21]  Michael Johnston,et al.  Qualia Structure and the Compositional Interpretation of Compounds , 1999 .

[22]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[23]  Edward Hoenkamp Spotting Ontological Lacunae through Spectrum Analysis of Retrieved Documents , 2007 .

[24]  R. Overberg,et al.  Illness stories on the Internet: features of websites disclosing breast cancer patients' illness stories in the Dutch language. , 2006, Patient education and counseling.

[25]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[26]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[27]  Eduard Hoenkamp,et al.  Unitary Operators on the Document Spac , 2003, J. Assoc. Inf. Sci. Technol..

[28]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[29]  Eduard Hoenkamp,et al.  Computing Latent Taxonomies from Patients' Spontaneous Self-Disclosure to Form Compatible Support Groups , 2006, MIE.