The authors present a graph-based algorithm for searching potentially large collections of unstructured data, and discuss its implementation as a search engine designed to offer advanced relevance feedback features to users who may have limited familiarity with search tools. The technique, which closely resembles the spreading activation network model described by Scott Preece, uses a term-document matrix to generate a bipartite graph of term and document nodes representing the document collection. This graph can be searched by a simple recursive procedure that distributes energy from an initial query node. Nodes that acquire energy above a specified threshold comprise the result set. Initial results on live collections suggest that this technique may offer performance comparable to latent semantic indexing (LSI), while avoiding some of that technique’s computational pitfalls. Both the algorithm and its implementation in a production Web environment are discussed.
[1]
William M. Pottenger,et al.
The Role of the HDDI Collection Builder in Hierarchical Distributed Dynamic Indexing
,
2004
.
[2]
Richard A. Harshman,et al.
Indexing by Latent Semantic Analysis
,
1990,
J. Am. Soc. Inf. Sci..
[3]
Walter Kintsch,et al.
A Computational Theory of Complex Problem Solving Using Latent Semantic Analysis
,
2002
.
[4]
Elizabeth R. Jessup,et al.
Matrices, Vector Spaces, and Information Retrieval
,
1999,
SIAM Rev..
[5]
Hongyuan Zha,et al.
On Updating Problems in Latent Semantic Indexing
,
1997,
SIAM J. Sci. Comput..
[6]
Alexander F. Gelbukh,et al.
Information Retrieval with Conceptual Graph Matching
,
2000,
DEXA.
[7]
Treebank Penn,et al.
Linguistic Data Consortium
,
1999
.
[8]
Scott Everett Preece.
A spreading activation network model for information retrieval
,
1981
.