Semantic Search of Unstructured Data using Contextual Network Graphs

The authors present a graph-based algorithm for searching potentially large collections of unstructured data, and discuss its implementation as a search engine designed to offer advanced relevance feedback features to users who may have limited familiarity with search tools. The technique, which closely resembles the spreading activation network model described by Scott Preece, uses a term-document matrix to generate a bipartite graph of term and document nodes representing the document collection. This graph can be searched by a simple recursive procedure that distributes energy from an initial query node. Nodes that acquire energy above a specified threshold comprise the result set. Initial results on live collections suggest that this technique may offer performance comparable to latent semantic indexing (LSI), while avoiding some of that technique’s computational pitfalls. Both the algorithm and its implementation in a production Web environment are discussed.