Constructing Web Views from Automated Navigation Sessions

Existing web search engines provide users with the ability to query an off-line database of indices in order to decide on an entry point for further manual navigation. Results are often presented as a list of URLs in descending order of relevance, with no information on the underlying topology of the result set. We believe that information on the topology is important for useful exploration and can also help to reduce the feeling of disorientation that users experience. We present an alternative to the result set of a conventional search engine, which we call a web probabilistic view – a weighted subgraph of the underlying document space which maximizes the overall expected relevance of trails. We model a web database as a probabilistic grammar and present several algorithms to calculate its weights, expressed as probabilities attached to transition rules. We provide results from a recent set of experiments, showing the effectiveness of our approach measured as improvement in the expected relevance of the grammar.