What makes data robust: a data analysis in learning to rank

When applying learning to rank algorithms in real search applications, noise in human labeled training data becomes an inevitable problem which will affect the performance of the algorithms. Previous work mainly focused on studying how noise affects ranking algorithms and how to design robust ranking algorithms. In our work, we investigate what inherent characteristics make training data robust to label noise. The motivation of our work comes from an interesting observation that a same ranking algorithm may show very different sensitivities to label noise over different data sets. We thus investigate the underlying reason for this observation based on two typical kinds of learning to rank algorithms (i.e.~pairwise and listwise methods) and three different public data sets (i.e.~OHSUMED, TD2003 and MSLR-WEB10K). We find that when label noise increases in training data, it is the \emph{document pair noise ratio} (i.e.~\emph{pNoise}) rather than \emph{document noise ratio} (i.e.~\emph{dNoise}) that can well explain the performance degradation of a ranking algorithm.

[1]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[2]  Gabriella Kazai,et al.  An analysis of systematic judging errors in information retrieval , 2012, CIKM.

[3]  Hang Li,et al.  Improving quality of training data for learning to rank using click-through data , 2010, WSDM '10.

[4]  Mark Sanderson,et al.  Quantifying test collection quality based on the consistency of relevance judgements , 2011, SIGIR.

[5]  Evangelos Kanoulas,et al.  A large-scale study of the effect of training set characteristics over learning-to-rank algorithms , 2011, SIGIR '11.

[6]  Vidit Jain,et al.  Learning to re-rank: query-dependent image re-ranking using click data , 2011, WWW.

[7]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[8]  Ellen M. Voorhees,et al.  Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[9]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[10]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[11]  Peter Bailey,et al.  Relevance assessment: are judges exchangeable and does it matter , 2008, SIGIR '08.

[12]  Jeroen B. P. Vuurens,et al.  How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[13]  Emine Yilmaz,et al.  Document selection methodologies for efficient and effective learning-to-rank , 2009, SIGIR.

[14]  Matthew Lease,et al.  Learning to rank from a noisy crowd , 2011, SIGIR.

[15]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[16]  William W. Cohen,et al.  A Meta-Learning Approach for Robust Rank Learning , 2008 .

[17]  Wei Liu,et al.  Noise resistant graph ranking for improved web image search , 2011, CVPR 2011.

[18]  Tao Qin,et al.  Selecting optimal training data for learning to rank , 2011, Inf. Process. Manag..

[19]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[20]  Carla E. Brodley,et al.  Class Noise Mitigation Through Instance Weighting , 2007, ECML.

[21]  Rodrygo L. T. Santos,et al.  The whens and hows of learning to rank for web search , 2012, Information Retrieval.

[22]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.