A large-scale study of the effect of training set characteristics over learning-to-rank algorithms
暂无分享,去创建一个
In this work we describe the results of a large-scale study on the effect of the distribution of labels across the different grades of relevance in the training set on the performance of trained ranking functions. In a controlled experiment we generate a large number of training datasets wih different label distributions and employ three learning to rank algo- rithms over these datasets. We investigate the effect of these distributions on the accuracy of obtained ranking functions to give an insight into the manner training sets should be constructed.
[1] Emine Yilmaz,et al. Document selection methodologies for efficient and effective learning-to-rank , 2009, SIGIR.
[2] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.
[3] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .