Machine Learning Methods and Models for Ranking

Ranking problems are ubiquitous and occur in a variety of domains that include social choice, information retrieval, computational biology and many others. Recent advancements in information technology have opened new data processing possibilities and significantly increased the complexity of computationally feasible methods. Through these advancements ranking models are now beginning to be applied to many new and diverse problems. Across these problems data, which ranges from gene expressions to images and web-documents, has vastly different properties and is often not human generated. This makes it challenging to apply many of the existing models for ranking which primarily originate in social choice and are typically designed for human generated preference data. As the field continues to evolve a new trend has recently emerged where machine learning methods are being used to automatically learn the ranking models. While these methods typically lack the theoretical support of the social choice models they often show excellent empirical performance and are able to handle large and diverse data placing virtually no restrictions on the data type. These model have now been successfully applied to many diverse ranking problems including image retrieval, protein selection, machine translation and many others. Inspired by these promising results the work presented in this thesis aims to advance machine methods for ranking and develop new techniques to allow effective modeling of existing and future problems. The presented work concentrates on three different but related domains: information retrieval, preference aggregation and collaborative filtering. In each domain we develop new models together with learning and inference methods and empirically verify our models on real-life data.

[1]  Yuan Yao,et al.  Statistical ranking and combinatorial Hodge theory , 2008, Math. Program..

[2]  Leonidas J. Guibas,et al.  Exploiting Probabilistic Independence for Permutations , 2009, AISTATS.

[3]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[4]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[5]  Thierry Bertin-Mahieux,et al.  The million song dataset challenge , 2012, WWW.

[6]  Leonidas J. Guibas,et al.  Efficient Inference for Distributions on Permutations , 2007, NIPS.

[7]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[8]  Maksims Volkovs,et al.  Learning to rank with multiple objective functions , 2011, WWW.

[9]  Hugo Larochelle,et al.  Learning to rank by aggregating expert preferences , 2012, CIKM.

[10]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[11]  Daniel Tarlow,et al.  Using Combinatorial Optimization within Max-Product Belief Propagation , 2006, NIPS.

[12]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[13]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[14]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[15]  William R Taylor,et al.  Protein Structure Comparison Using Bipartite Graph Matching and Its Application to Protein Structure Classification * , 2002, Molecular & Cellular Proteomics.

[16]  Jin Yu,et al.  Exponential Family Graph Matching and Ranking , 2009, NIPS.

[17]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[18]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[19]  H. A. David,et al.  The method of paired comparisons , 1966 .

[20]  Eric Vigoda,et al.  A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries , 2004, JACM.

[21]  Tie-Yan Liu,et al.  Generalization analysis of listwise learning-to-rank algorithms , 2009, ICML '09.

[22]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[23]  Fei Wang,et al.  Semi-supervised ranking aggregation , 2011, Inf. Process. Manag..

[24]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[25]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[26]  Xiaojie Yuan,et al.  Are click-through data adequate for learning web search rankings? , 2008, CIKM '08.

[27]  Chao Liu,et al.  Click chain model in web search , 2009, WWW '09.

[28]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[29]  John D. Lafferty,et al.  Cranking: Combining Rankings Using Conditional Probability Models on Permutations , 2002, ICML.

[30]  Christopher J. C. Burges,et al.  Ranking as Function Approximation , 2007 .

[31]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[32]  Maksims Volkovs,et al.  Collaborative Ranking With 17 Parameters , 2012, NIPS.

[33]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[34]  Tom Minka,et al.  TrueSkill Through Time: Revisiting the History of Chess , 2007, NIPS.

[35]  Gideon S. Mann,et al.  Generalized Expectation Criteria , 2007 .

[36]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[37]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[38]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[39]  Frank Dellaert,et al.  EM, MCMC, and Chain Flipping for Structure from Motion with Unknown Correspondence , 2004, Machine Learning.

[40]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[41]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[42]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[43]  A. Pekec,et al.  The repeated insertion model for rankings: Missing link between two subset choice models , 2004 .

[44]  Richard S. Zemel,et al.  Learning Articulated Structure and Motion , 2010, International Journal of Computer Vision.

[45]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[46]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[47]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[48]  Michael I. Jordan,et al.  On the Consistency of Ranking Algorithms , 2010, ICML.

[49]  Alexander J. Smola,et al.  Direct Optimization of Ranking Measures , 2007, ArXiv.

[50]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[51]  David F. Gleich,et al.  Rank aggregation via nuclear norm minimization , 2011, KDD.

[52]  John Guiver,et al.  Bayesian inference for Plackett-Luce ranking models , 2009, ICML '09.

[53]  Jeff A. Bilmes,et al.  Consensus ranking under the exponential model , 2007, UAI.

[54]  Fabio Tozeto Ramos,et al.  Robust place recognition with stereo cameras , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[55]  Craig Boutilier,et al.  Learning Mallows Models with Pairwise Preferences , 2011, ICML.

[56]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[57]  George Karypis,et al.  Evaluation of Item-Based Top-N Recommendation Algorithms , 2001, CIKM '01.

[58]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[59]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[60]  Suhrid Balakrishnan,et al.  Collaborative ranking , 2012, WSDM '12.

[61]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[62]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[63]  David Mease,et al.  A Penalized Maximum Likelihood Approach for the Ranking of College Football Teams Independent of Victory Margins , 2003 .

[64]  Maksims Volkovs,et al.  A flexible generative model for preference aggregation , 2012, WWW.

[65]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[66]  Phil Blunsom,et al.  Semantic Role Labelling with Tree Conditional Random Fields , 2005, CoNLL.

[67]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[68]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[69]  R. A. Bradley,et al.  Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .

[70]  Noah A. Smith,et al.  Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.

[71]  Richard S. Zemel,et al.  Unsupervised Learning with Non-Ignorable Missing Data , 2005, AISTATS.

[72]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[73]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[74]  Ruslan Salakhutdinov,et al.  Learning Deep Boltzmann Machines using Adaptive MCMC , 2010, ICML.

[75]  Tamir Hazan,et al.  A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.

[76]  R. Plackett The Analysis of Permutations , 1975 .

[77]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[78]  Michael I. Jordan,et al.  Variational Inference over Combinatorial Spaces , 2010, NIPS.

[79]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[80]  Jason Weston,et al.  Protein ranking: from local to global structure in the protein similarity network. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[81]  Tao Qin,et al.  Query-level stability and generalization in learning to rank , 2008, ICML '08.

[82]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[83]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[84]  Dan Roth,et al.  Unsupervised rank aggregation with distance-based models , 2008, ICML '08.

[85]  Mark Huber,et al.  Fast approximation of the permanent for very dense problems , 2008, SODA '08.

[86]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[87]  L. Thurstone The method of paired comparisons for social values , 1927 .

[88]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[89]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[90]  Tao Qin,et al.  Query-level loss functions for information retrieval , 2008, Inf. Process. Manag..

[91]  Tao Qin,et al.  Supervised rank aggregation , 2007, WWW '07.

[92]  David A. McAllester,et al.  Generalization bounds and consistency for latent-structural probit and ramp loss , 2011, MLSLP.

[93]  Tao Qin,et al.  A New Probabilistic Model for Rank Aggregation , 2010, NIPS.

[94]  Alexander J. Smola,et al.  Maximum Margin Matrix Factorization for Collaborative Ranking , 2007 .

[95]  Bert Huang,et al.  Loopy Belief Propagation for Bipartite Maximum Weight b-Matching , 2007, AISTATS.

[96]  Yann Chevaleyre,et al.  A Short Introduction to Computational Social Choice , 2007, SOFSEM.

[97]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[98]  K. Arrow Social Choice and Individual Values , 1951 .

[99]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.