The information on the World Wide Web is growing without bound. Users may have very diversified preferences in the pages they target through a search engine. It is therefore a challenging task to adapt a search engine to suit the needs of a particular community of users who share similar interests. In this paper, we propose a new algorithm, Ranking SVM in a Co-training Framework (RSCF). Essentially, the RSCF algorithm takes the clickthrough data containing the items in the search result that have been clicked on by a user as an input, and generates adaptive rankers as an output. By analyzing the clickthrough data, RSCF first categorizes the data as the labelled data set, which contains the items that have been scanned already, and the unlabelled data set, which contains the items that have not yet been scanned. The labelled data is then augmented with unlabelled data to obtain a larger data set for training the rankers. We demonstrate that the RSCF algorithm produces better ranking results than the standard Ranking SVM algorithm. Based on RSCF we develop a metasearch engine that comprises MSNSearch, Wisenut, and Overture, and carry out an online experiment to show that our metasearch engine outperforms Google.
[1]
Garrison W. Cottrell,et al.
Automatic combination of multiple ranked retrieval systems
,
1994,
SIGIR '94.
[2]
Thorsten Joachims,et al.
Evaluating Retrieval Performance Using Clickthrough Data
,
2003,
Text Mining.
[3]
Dayne Freitag,et al.
A Machine Learning Architecture for Optimizing Web Search Engines
,
1999
.
[4]
Ayhan Demiriz,et al.
Semi-Supervised Support Vector Machines
,
1998,
NIPS.
[5]
Cyril Goutte,et al.
Note on Free Lunches and Cross-Validation
,
1997,
Neural Computation.
[6]
Thorsten Joachims,et al.
Optimizing search engines using clickthrough data
,
2002,
KDD.
[7]
Yoram Singer,et al.
Learning to Order Things
,
1997,
NIPS.
[8]
Norbert Fuhr,et al.
Optimum polynomial retrieval functions based on the probability ranking principle
,
1989,
TOIS.
[9]
Avrim Blum,et al.
The Bottleneck
,
2021,
Monopsony Capitalism.