Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval

For a query submitted by a user, the goal of an information retrieval system is to return a list of documents which are highly relevant with respect to that query. Traditionally different scoring methods, ranging from simple heuristic models to probabilistic models, have been used for this task. Recently researchers have started to use supervised machine learning techniques for solving this problem which is then called the learning-to-rank (LtR) problem. Many supervised learning methods have been tested so far with empirical success over conventional methods [4] The random forest is a relatively simple but effective and efficient learning algorithm which aggregates the predictions of a large number of independent and variant base learners, namely decision trees. Its major benefits over other state-of-the-art methods include inherent parallelizability, ease of tuning and competitive performance. These benefits attract researchers across various disciplines where a random forest is a very popular choice. However, for LtR task, the random forest has not been thoroughly investigated. In this research, we investigate the random forest based LtR algorithms. We aim at improving the efficiency, effectiveness, and understanding of these algorithms. With respect to the first goal, we employ undersampling techniques and leverage the inherent structure of a random forest to achieve better scalability, especially for highly imbalanced datasets [2]. We also reduce the correlation among the trees to reduce learning time and to improve performance [3]. With respect to the second goal, we investigate various objective functions ranging from completely randomized splitting criterion to so-called listwise splitting [1]. We also conduct a thorough study on random forest based pointwise algorithms. With respect to the third goal, we develop methods for estimating the bias and variance of rank-learning algorithms, and examine their empirical behavior against parameters of the learning algorithm

[1]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[2]  Tian Xia,et al.  Direct optimization of ranking measures for learning to rank models , 2013, KDD.

[3]  D. Sculley,et al.  Rank Aggregation for Similar Items , 2007, SDM.

[4]  Hemant Ishwaran,et al.  The effect of splitting on random forests , 2014, Machine Learning.

[5]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[6]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[7]  Yanjun Qi Random Forest for Bioinformatics , 2012 .

[8]  Joydeep Ghosh,et al.  Investigation of the random forest framework for classification of hyperspectral data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Mark Sanderson,et al.  Features of Disagreement Between Retrieval Effectiveness Measures , 2015, SIGIR.

[10]  Stephen E. Robertson,et al.  On the choice of effectiveness measures for learning to rank , 2010, Information Retrieval.

[11]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[12]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[13]  Fen Xia,et al.  Ordinal Regression as Multiclass Classification , 2007 .

[14]  David Hawking,et al.  Challenges in Enterprise Search , 2004, ADC.

[15]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[16]  Ben Carterette,et al.  On rank correlation and the distance between rankings , 2009, SIGIR.

[17]  Craig MacDonald,et al.  Retrieval sensitivity under training using different measures , 2008, SIGIR '08.

[18]  Yan Zhang,et al.  Comparison of random forest, random ferns and support vector machine for eye state classification , 2015, Multimedia Tools and Applications.

[19]  Pedro M. Domingos A Unified Bias-Variance Decomposition , 2022 .

[20]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[21]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[22]  Dmitry Yurievich Pavlov,et al.  BagBoo: a scalable hybrid bagging-the-boosting model , 2010, CIKM '10.

[23]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[24]  Joseph Sexton,et al.  Standard errors for bagged and random forest estimators , 2009, Comput. Stat. Data Anal..

[25]  Donald Metzler Estimation, sensitivity, and generalization in parameterized retrieval models , 2006, CIKM '06.

[26]  Stefan Wager Asymptotic Theory for Random Forests , 2014, 1405.0352.

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  ChengXiang Zhai,et al.  Adaptive term frequency normalization for BM25 , 2011, CIKM '11.

[29]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.

[30]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[31]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[32]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[33]  Ellen M. Voorhees,et al.  Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[34]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[35]  Mark Sanderson,et al.  The relationship between IR effectiveness measures and user satisfaction , 2007, SIGIR.

[36]  Mark Sanderson,et al.  Size and Source Matter: Understanding Inconsistencies in Test Collection-Based Evaluation , 2014, CIKM.

[37]  Raffaele Perego,et al.  QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees , 2015, SIGIR.

[38]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[39]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[40]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[41]  C. Burges,et al.  Learning to Rank Using Classification and Gradient Boosting , 2008 .

[42]  Tao Qin,et al.  Selecting optimal training data for learning to rank , 2011, Inf. Process. Manag..

[43]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[44]  W. Bruce Croft,et al.  Learning concept importance using a weighted dependence model , 2010, WSDM '10.

[45]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[46]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[47]  Robert R. Freimuth,et al.  A weighted random forests approach to improve predictive performance , 2013, Stat. Anal. Data Min..

[48]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[49]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[50]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[51]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[52]  Xueqi Cheng,et al.  What makes data robust: a data analysis in learning to rank , 2014, SIGIR.

[53]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[54]  Gilles Louppe,et al.  Learning to rank with extremely randomized trees , 2010, Yahoo! Learning to Rank Challenge.

[55]  Rich Caruana,et al.  Distributed tuning of machine learning algorithms using MapReduce Clusters , 2011, LDMTA '11.

[56]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[57]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[58]  Stephen E. Robertson,et al.  Deep versus shallow judgments in learning to rank , 2009, SIGIR.

[59]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[60]  H FriedmanJerome On Bias, Variance, 0/1Loss, and the Curse-of-Dimensionality , 1997 .

[61]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[62]  W. Bruce Croft,et al.  Feature Selection for Document Ranking using Best First Search and Coordinate Ascent , 2010 .

[63]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[64]  Mark R. Segal,et al.  Machine Learning Benchmarks and Random Forest Regression , 2004 .

[65]  Virgil Pavlu,et al.  Large Scale IR Evaluation. , 2008 .

[66]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[67]  Pierre Geurts,et al.  Bias vs Variance Decomposition for Regression and Classification , 2005, Data Mining and Knowledge Discovery Handbook.

[68]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[69]  Tie-Yan Liu,et al.  A Theoretical Analysis of NDCG Type Ranking Measures , 2013, COLT.

[70]  Rodrygo L. T. Santos,et al.  The whens and hows of learning to rank for web search , 2012, Information Retrieval.

[71]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[72]  Tao Qin,et al.  Query-level stability and generalization in learning to rank , 2008, ICML '08.

[73]  Kushagra Vaid,et al.  Web Search Using Small Cores: Quantifying the Price of Efficiency , 2009 .

[74]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[75]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[76]  J. Friedman Stochastic gradient boosting , 2002 .

[77]  Pinar Donmez,et al.  On the local optimality of LambdaRank , 2009, SIGIR.

[78]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[79]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[80]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[81]  Hongyuan Zha,et al.  A regression framework for learning ranking functions using relative relevance judgments , 2007, SIGIR.

[82]  Yixin Chen,et al.  Optimal Action Extraction for Random Forests and Boosted Trees , 2015, KDD.

[83]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[84]  Hans-Peter Piepho,et al.  A comparison of random forests, boosting and support vector machines for genomic selection , 2011, BMC proceedings.

[85]  Jaime G. Carbonell,et al.  Optimizing estimated loss reduction for active sampling in rank learning , 2008, ICML '08.

[86]  Geoffrey I. Webb,et al.  Deep Broad Learning - Big Models for Big Data , 2015, ArXiv.

[87]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[88]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[89]  W. B. Croft,et al.  Indri at TREC 2006: Lessons Learned From Three Terabyte Tracks , 2006 .

[90]  Johannes R. Sveinsson,et al.  Random Forests for land cover classification , 2006, Pattern Recognit. Lett..

[91]  Tie-Yan Liu,et al.  Generalization analysis of listwise learning-to-rank algorithms , 2009, ICML '09.

[92]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[93]  O. Chapelle Large margin optimization of ranking measures , 2007 .

[94]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[95]  Guoyi Zhang,et al.  Bias-corrected random forests in regression , 2012 .

[96]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[97]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[98]  Laurent Heutte,et al.  Towards a Better Understanding of Random Forests through the Study of Strength and Correlation , 2009, ICIC.

[99]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[100]  Hang Li,et al.  Top-k Consistency of Learning to Rank Methods , 2009 .

[101]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[102]  Andrew Trotman,et al.  Improvements to BM25 and Language Models Examined , 2014, ADCS.

[103]  Muhammad Ibrahim,et al.  Improving Scalability and Performance of Random Forest Based Learning-to-Rank Algorithms by Aggressive Subsampling , 2014, AusDM.

[104]  Ananth Mohan,et al.  An Empirical Analysis on Point-wise Machine Learning Techniques using Regression Trees for Web-search Ranking , 2010 .

[105]  Robin Genuer,et al.  Variance reduction in purely random forests , 2012 .

[106]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[107]  D. Sculley,et al.  Large Scale Learning to Rank , 2009 .

[108]  Laurent Heutte,et al.  On the selection of decision trees in Random Forests , 2009, 2009 International Joint Conference on Neural Networks.

[109]  Raymond Y. K. Lau,et al.  Toward a semantic granularity model for domain-specific information retrieval , 2011, TOIS.

[110]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[111]  Muhammad Ibrahim,et al.  Undersampling Techniques to Re-balance Training Data for Large Scale Learning-to-Rank , 2014, AIRS.

[112]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[113]  Allan Hanbury,et al.  Toward a model of domain-specific search , 2013, OAIR.

[114]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[115]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[116]  Mark Sanderson,et al.  Do user preferences and evaluation measures line up? , 2010, SIGIR.

[117]  Yunming Ye,et al.  Hybrid Random Forests: Advantages of Mixed Trees in Classifying Text Data , 2012, PAKDD.

[118]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[119]  Tie-Yan Liu,et al.  Future directions in learning to rank , 2010, Yahoo! Learning to Rank Challenge.

[120]  John D. Lafferty,et al.  A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.

[121]  Manzur Murshed,et al.  From Tf-Idf to learning-to-rank: An overview , 2016 .

[122]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[123]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[124]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[125]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[126]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[127]  Ya Zhang,et al.  Active Learning for Ranking through Expected Loss Optimization , 2010, IEEE Transactions on Knowledge and Data Engineering.

[128]  Peter Bailey,et al.  Relevance assessment: are judges exchangeable and does it matter , 2008, SIGIR '08.

[129]  ChengXiang Zhai,et al.  When documents are very long, BM25 fails! , 2011, SIGIR.

[130]  W. Bruce Croft,et al.  Two-Stage Learning to Rank for Information Retrieval , 2013, ECIR.

[131]  Hang Li,et al.  Improving quality of training data for learning to rank using click-through data , 2010, WSDM '10.

[132]  Muhammad Ibrahim,et al.  Comparing Pointwise and Listwise Objective Functions for Random-Forest-Based Learning-to-Rank , 2016, ACM Trans. Inf. Syst..

[133]  Emine Yilmaz,et al.  Semi-supervised learning to rank with preference regularization , 2011, CIKM '11.

[134]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[135]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[136]  Hongwei Ding,et al.  Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data , 2010, 2010 IEEE 7th International Conference on E-Business Engineering.

[137]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[138]  Misha Denil,et al.  Narrowing the Gap: Random Forests In Theory and In Practice , 2013, ICML.

[139]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[140]  Parth Gupta,et al.  Learning to Rank : Using Bayesian Networks , 2011 .

[141]  Philip S. Yu,et al.  Is random model better? On its accuracy and efficiency , 2003, Third IEEE International Conference on Data Mining.

[142]  Lidan Wang,et al.  Learning to efficiently rank , 2010, SIGIR.

[143]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[144]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[145]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[146]  Torsten Hothorn,et al.  Recursive partitioning on incomplete data using surrogate decisions and multiple imputation , 2012, Comput. Stat. Data Anal..

[147]  Tao Qin,et al.  A general approximation framework for direct optimization of information retrieval measures , 2010, Information Retrieval.

[148]  Cristina V. Lopes,et al.  Bagging gradient-boosted trees for high precision, low variance ranking models , 2011, SIGIR.

[149]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[150]  Ullrich Köthe,et al.  On Oblique Random Forests , 2011, ECML/PKDD.

[151]  Ying Zhang,et al.  Differences in effectiveness across sub-collections , 2012, CIKM.

[152]  Alistair Moffat,et al.  Seven Numeric Properties of Effectiveness Metrics , 2013, AIRS.

[153]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[154]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[155]  Hwanjo Yu,et al.  SVM selective sampling for ranking with application to data retrieval , 2005, KDD '05.

[156]  Feng Pan,et al.  Feature selection for ranking using boosted trees , 2009, CIKM.

[157]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[158]  J. Friedman,et al.  On bagging and nonlinear estimation , 2007 .

[159]  Kilian Q. Weinberger,et al.  Web-Search Ranking with Initialized Gradient Boosted Regression Trees , 2010, Yahoo! Learning to Rank Challenge.

[160]  Bertrand Thirion,et al.  Random Forests based feature selection for decoding fMRI data , 2010 .

[161]  Erwan Scornet,et al.  On the asymptotics of random forests , 2014, J. Multivar. Anal..

[162]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[163]  Berkant Barla Cambazoglu,et al.  Early exit optimizations for additive machine learned ranking systems , 2010, WSDM '10.

[164]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[165]  Geoffrey I. Webb,et al.  The Need for Low Bias Algorithms in Classification Learning from Large Data Sets , 2002, PKDD.

[166]  Emine Yilmaz,et al.  Document selection methodologies for efficient and effective learning-to-rank , 2009, SIGIR.

[167]  Craig MacDonald,et al.  Learning Models for Ranking Aggregates , 2011, ECIR.

[168]  Andrew McCallum,et al.  A Machine Learning Approach to Building Domain-Specific Search Engines , 1999, IJCAI.

[169]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[170]  Oluwasanmi Koyejo,et al.  Learning to Rank With Bregman Divergences and Monotone Retargeting , 2012, UAI.