An interdisciplinary perspective on information retrieval

Let me begin by saying how honored I am to receive the Gerard Salton Award from SIGIR. His pioneering work, including the vector space model, term weightings, relevance feedback, and the development and evaluation of automatic text retrieval systems, defined and shaped the field in important ways. I did not have many opportunities to interact directly with Gerry Salton, although our paths did cross in interesting ways. Shortly after I started working at Bell Labs, I was introduced to the field of information retrieval by Michael Lesk, who had worked with Gerry at Harvard in the mid-1960s to develop the original SMART system. The last conference at which I interacted with Gerry was not SIGIR or TREC, as one might expect, but rather CHI 1995, where he spoke at panel entitled, Browsing vs. search: Can we find a synergy?’, a theme to which I return. Following the tradition of previous recipients of the Salton Award (Gerard Salton (1983), Karen Spärck Jones (1988), Cyril Cleverdon (1991), William Cooper (1994), Tefko Saracevic (1997), Stephen Robertson (2000), W. Bruce Croft (2003), and C. J. “Keith” van Rijsbergen (2006)), I present a personal reflection on information retrieval. I begin with some remarks on the problems that have motivated my research in the area for almost three decades, and the approaches that I have taken to address them. I conclude by identifying, what I believe to be, some important opportunities to continue to extend and improve information retrieval moving forward. Given my academic roots in mathematics and cognitive psychology, it is not surprising that my research interests are at the intersection of information retrieval and human-computer interaction. My approach to problems is more user-centric than system-centric, more empirical than theoretical, but I cross these boundaries as different stages of investigation. I do so because I believe that the success of information retrieval systems depends critically on both the ability of systems to efficiently and effectively represent, match and rank objects, and the ability to understand and support individuals in articulating their information needs and analyzing the retrieved results to solve the problem that motivated their search in the first place. My interest in information retrieval started in the early 1980’s with the observation that different people use a surprisingly wide variety of words to describe the same object or concept – things like computer commands, or keywords for information objects. This fundamental characteristic of human word usage set limits as to how well a simple lexical matching system that assigned only a few terms (no matter how carefully chosen) to each object could do in satisfying users. (It is symptomatic of the problem that we described this research as vocabulary mismatch, verbal disagreement, and statistical semantics.) We developed, deployed and evaluated solutions that involved collecting multiple aliases for objects, and reducing the dimensionality of the representation using techniques like Latent Semantic Indexing (LSI). Much of my subsequent research has been similarity motivated by the goal of identifying limitations of current retrieval systems, and developing new algorithms, interaction techniques or evaluation methods to overcome them. We have made some progress by representing and leveraging richer contextual information about users, content domains, and the larger task environments in which retrieval takes place. For example, we have modeled searchers’ interests and activities over time to personalize search results and better support people in re-finding information they have previously seen; identified attributes that reflect the varied relationships between objects and tightly coupled faceted browsing of these attributes with search to support more flexible access strategies; and used some simple task contexts to proactively retrieve related information. Although we are encouraged by these examples there is much to be done theoretically to represent knowledge about searchers and tasks into a consistent framework, and empirically to extend evaluation methodologies to better capture the iterative and interactive nature of search. Copyright is held by the author/owner(s). SIGIR’09, July 19–23, 2009, Boston, Massachusetts, USA. ACM 978-1-60558-483-6/09/07.