Surrogate scoring for improved metasearch precision

We describe a method for improving the precision of metasearch results based upon scoring the visual features of documents' surrogate representations. These surrogate scores are used during fusion in place of the original scores or ranks provided by the underlying search engines. Visual features are extracted from typical search result surrogate information, such as title, snippet, URL, and rank. This approach specifically avoids the use of search engine-specific scores and collection statistics that are required by most traditional fusion strategies. This restriction correctly reflects the use of metasearch in practice, in which knowledge of the underlying search engines' strategies cannot be assumed. We evaluate our approach using a precision-oriented test collection of manually-constructed binary relevance judgments for the top ten results from ten web search engines over 896 queries. We show that our visual fusion approach significantly outperforms the rCombMNZ fusion algorithm by 5.71%, with 99% confidence, and the best individual web search engine by 10.9%, with 99% confidence.