Connecting topics in document collections with stepping stones and pathways

In this paper, we present Stepping Stones and Pathways (SSP), an alternative model of building and presenting answers for the cases when queries on document collections cannot be answered just by a ranked list. Stepping Stones can handle questions like: "What is the relation of topics X and Y?" SSP addresses when the contents of a small set of related documents is needed as an answer rather than a single document, or when "query splitting" is required to satisfactorily explore a document space. Query results are networks of document groups representing topics, each group relating to and connecting (by documents) to other groups in the network. Thus, a network answers the user's information need. We devise new and more effective representations and techniques to visualize such answers, and to involve users as part of the answer-finding process. In order to verify the validity of our approach, and since the questions we aim to answer involve multiple topics, we performed a study involving a custom built broad collection of operating systems research papers, and evaluated the results with interested computer science students, using multiple measures.

[1]  A Borodin,et al.  Xii-1 Xii. Query Splitting in Relevance Feedback Systems , .

[2]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[3]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[4]  Mark A. Spasser The Enacted Fate of Undiscovered Public Knowledge , 1997, J. Am. Soc. Inf. Sci..

[5]  Raúl E. Valdés-Pérez,et al.  Discovery tools for science apps , 1999, Commun. ACM.

[6]  Michael D. Gordon,et al.  Literature-Based Discovery by Lexical Statistics , 1999, J. Am. Soc. Inf. Sci..

[7]  Don R. Swanson,et al.  Two medical literatures that are logically but not bibliographically connected , 1987, J. Am. Soc. Inf. Sci..

[8]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[9]  Susan T. Dumais,et al.  Using Latent Semantic Indexing for Literature Based Discovery , 1998, J. Am. Soc. Inf. Sci..

[10]  Michael David Williams,et al.  What Makes RABBIT Run? , 1984, Int. J. Man Mach. Stud..

[11]  Fernando Adrian Das Neves,et al.  Stepping Stones and Pathways:Improving Retrieval by Chains of Relationships between Documents , 2004 .

[12]  Neil R. Smalheiser,et al.  Information discovery from complementary literatures: Categorizing viruses as potential weapons , 2001, J. Assoc. Inf. Sci. Technol..

[13]  Deept Kumar,et al.  Turning CARTwheels: an alternating algorithm for mining redescriptions , 2003, KDD.

[14]  Stephen J. Bensman Essays of an information scientist , 1986, J. Am. Soc. Inf. Sci..

[15]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.