A process model for information retrieval context learning and knowledge discovery

In this paper we take a fresh look at the information retrieval (IR) problem of balancing recall with precision in electronic document extraction. We examine the IR constructs of uncertainty, context and relevance, proposing a new process model for context learning, and introducing a new IT artifact designed to support user driven learning by leveraging explicit knowledge to discover implicit knowledge within a corpus of documents. The IT artifact is a prototype designed to present a small set of extracted documents from a targeted corpus based upon user inputted criteria. The prototype provides the user with the opportunity to balance exploration and exploitation, via iterative relevance feedback to address the problem of imprecision resulting from uncertainty. We model the problem as an exploration–exploitation dilemma and apply it to a specific case of IR called eDiscovery. We conduct a series of behavioral experiments to evaluate the model and the artifact. Our initial findings indicate that the proposed model and the artifact improve performance in the IR result.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  Su-Lin Lee,et al.  A General Framework for Context-Specific Image Segmentation Using Reinforcement Learning , 2013, IEEE Transactions on Medical Imaging.

[3]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[4]  Wim Vanduffel,et al.  Should I Stay or Should I Go? , 2016, Neuron.

[5]  Harvey S. Hyman,et al.  Using Bag of Words (BOW) and Standard Deviations to Represent Expected Structures for Document Retrieval: A Way of Thinking that Leads to Method Choices , 2010, TREC.

[6]  C. R. Chowdhury,et al.  Information retrieval using fuzzy c-means clustering and modified vector space model , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[7]  Marcia J. Bates,et al.  Subject access in online catalogs: A design model , 1986, J. Am. Soc. Inf. Sci..

[8]  MODIFIED ACTION VALUE METHOD APPLIED TO ‘ n ’-ARMED BANDIT PROBLEMS USING REINFORCEMENT LEARNING , 2012 .

[9]  Jannica Heinström Broad exploration or precise specificity: Two basic information seeking patterns among students: Research Articles , 2006 .

[10]  A. Blandford,et al.  E-discovery viewed as integrated human-computer sensemaking: the challenge of 'Frames' , 2008 .

[11]  B. C. Walsh,et al.  Online text retrieval via browsing , 1988, Inf. Process. Manag..

[12]  Samia Nefti-Meziani,et al.  Personalized Information Retrieval system in the Framework of Fuzzy Logic , 2008, EUSFLAT Conf..

[13]  Douglas W. Oard,et al.  Evaluation of information retrieval for E-discovery , 2010, Artificial Intelligence and Law.

[14]  ChengXiang Zhai,et al.  Exploration-exploitation tradeoff in interactive relevance feedback , 2010, CIKM '10.

[15]  Christoph Hölscher,et al.  Web search behavior of Internet experts and newbies , 2000, Comput. Networks.

[16]  Marcia J. Bates,et al.  The design of browsing and berrypicking techniques for the online search interface , 1989 .

[17]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[18]  Thomas T. Hills,et al.  The central executive as a search process: priming exploration and exploitation across domains. , 2010, Journal of experimental psychology. General.

[19]  K. Weick,et al.  Organizing and the Process of Sensemaking , 2005 .

[20]  Harvey S. Hyman,et al.  Modeling Concept and Context to Improve Performance in eDiscovery , 2011, TREC.

[21]  George L. Paul,et al.  Information Inflation: Can The Legal System Adapt? , 2007 .

[22]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[23]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[24]  Douglas W. Oard,et al.  Query Expansion for Noisy Legal Documents , 2008, TREC.

[25]  Yvonne Rogers,et al.  Cognitive strategies in web searching. , 1999 .

[26]  Marcia J. Bates,et al.  Designing for uncertainty , 2006, ASIST.

[27]  E. Rasmussen Evaluation in Information Retrieval , 2002 .

[28]  Zhiqiang Zheng,et al.  Selectively Acquiring Customer Information: A New Data Acquisition Problem and an Active Learning-Based Solution , 2006, Manag. Sci..

[29]  Marcia J. Bates,et al.  Information search tactics , 1979, J. Am. Soc. Inf. Sci..

[30]  Sally Jo Cunningham,et al.  Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone , 2004, International Journal on Digital Libraries.

[31]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[32]  Richard S. Sutton,et al.  Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.

[33]  Elizabeth A. Mannix,et al.  Should I Stay or Should I Go? , 2003 .

[34]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[35]  D. Berlyne Conflict, arousal, and curiosity , 2014 .

[36]  Ralf Dörner,et al.  Interactive visualization for opportunistic exploration of large document collections , 2010, Inf. Syst..

[37]  Tomaz Erjavec,et al.  A tool set for the quick and efficient exploration of large document collections , 2006, ArXiv.

[38]  D. Berlyne,et al.  Motivational problems raised by exploratory and epistemic behavior. , 1962 .

[39]  Gordon V. Cormack,et al.  Machine Learning for Information Retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks , 2009, TREC.

[40]  François Bry,et al.  Visual exploration and retrieval of XML document collections with the generic system X2 , 2005, International Journal on Digital Libraries.

[41]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[42]  Maura R. Grossman,et al.  Evaluation of machine-learning protocols for technology-assisted review in electronic discovery , 2014, SIGIR.

[43]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[44]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[45]  J. March Exploration and exploitation in organizational learning , 1991, STUDI ORGANIZZATIVI.

[46]  Carol Collier Kuhlthau,et al.  Inside the search process: Information seeking from the user's perspective , 1991, J. Am. Soc. Inf. Sci..

[47]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[48]  Carol Collier Kuhlthau Inside the Search Process: Information Seeking from the User's Perspective. , 1991 .

[49]  Wanda Pratt,et al.  Transparent Queries: investigation users' mental models of search engines , 2001, SIGIR '01.

[50]  Zhongsheng Hua,et al.  Reducing the Probability of Bankruptcy Through Supply Chain Coordination , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[51]  R E Wood,et al.  Impact of guided exploration and enactive exploration on self-regulatory mechanisms and information acquisition through electronic search. , 2001, The Journal of applied psychology.

[52]  Gobinda G. Chowdhury,et al.  Bibliometric information retrieval system (BIRS): A web search interface utilizing bibliometric research results , 2000, J. Am. Soc. Inf. Sci..

[53]  Gregory L. Fordham Using Keyword Search Terms in E-Discovery and How They Relate to Issues of Responsiveness, Privilege, Evidence Standards, and Rube Goldberg , 2009 .

[54]  Catherine Demangeot,et al.  Exploration and its manifestations in the context of online shopping , 2010 .

[55]  L ScottSteven A modern Bayesian look at the multi-armed bandit , 2010 .

[56]  Daniel A. Levinthal,et al.  Exploration and Exploitation in Organizational Learning , 2007 .

[57]  Nicholas M. Pace,et al.  Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery , 2012 .

[58]  Maura R. Grossman,et al.  Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review , 2011 .

[59]  Gobinda G. Chowdhury,et al.  Building environmentally sustainable information services: A green is research agenda , 2012, J. Assoc. Inf. Sci. Technol..

[60]  Steve Muylle,et al.  A grounded theory of World Wide Web search behaviour , 1999 .

[61]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[62]  Kathleen M. Sutcliffe,et al.  Special Issue: Frontiers of Organization Science, Part 1 of 2: Organizing and the Process of Sensemaking , 2005, Organ. Sci..

[63]  Mooweon Rhee,et al.  Exploration and Exploitation , 2016 .

[64]  Erich Schweighofer,et al.  Legal Query Expansion using Ontologies and Relevance Feedback , 2007, LOAIT.

[65]  Jannica Heinström,et al.  Broad exploration or precise specificity: Two basic information seeking patterns among students , 2006, J. Assoc. Inf. Sci. Technol..