论文信息 - Convergence Rates of Gradient Descent and MM Algorithms for Generalized Bradley-Terry Models - 字舞流文

Convergence Rates of Gradient Descent and MM Algorithms for Generalized Bradley-Terry Models

We show tight convergence rate bounds for gradient descent and MM algorithms for maximum likelihood estimation and maximum aposteriori probability estimation of a popular Bayesian inference method for generalized Bradley-Terry models. This class of models includes the Bradley-Terry model of paired comparisons, the Rao-Kupper model of paired comparisons with ties, the Luce choice model, and the Plackett-Luce ranking model. Our results show that MM algorithms have same convergence rates as gradient descent algorithms up to constant factors. For the maximum likelihood estimation, the convergence is linear with the rate crucially determined by the algebraic connectivity of the matrix of item pair co-occurrences in observed comparison data. For the Bayesian inference, the convergence rate is also linear, with the rate determined by a parameter of the prior distribution in a way that can make convergence arbitrarily slow for small values of this parameter. We propose a simple, first-order acceleration method that resolves the slow convergence issue.

Milan Vojnovic | Seyoung Yun | Kaifang Zhou | M. Vojnovic | Seyoung Yun | Kaifang Zhou | M. Vojnović

[1] E. Zermelo. Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung , 1929 .

[2] P. Moran. On the method of paired comparisons. , 1947, Biometrika.

[3] R. A. Bradley,et al. RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[4] R. A. Bradley,et al. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .

[5] R. A. Bradley. The rank analysis of incomplete block designs. II. Additional tables for the method of paired comparisons. , 1954 .

[6] O. Dykstra. A Note on the Rank Analysis of Incomplete Block Designs -- Applications beyond the Scope of Existing Tables , 1956 .

[7] L. R. Ford. Solution of a Ranking Problem from Binary Comparisons , 1957 .

[8] O. Dykstra. Rank Analysis of Incomplete Block Designs: A Method of Paired Comparisons Employing Unequal Repetitions on Pairs , 1960 .

[9] R. Luce,et al. Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[10] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .

[11] P. V. Rao,et al. Ties in Paired-Comparison Experiments: A Generalization of the Bradley-Terry Model , 1967 .

[12] M. Fiedler. Algebraic connectivity of graphs , 1973 .

[13] R. Plackett. The Analysis of Permutations , 1975 .

[14] R. Duncan Luce,et al. Individual Choice Behavior: A Theoretical Analysis , 1979 .

[15] V. Sunder,et al. The Laplacian spectrum of a graph , 1990 .

[16] L. Thurstone. A law of comparative judgment. , 1994 .

[17] R. Merris. Laplacian matrices of graphs: a survey , 1994 .

[18] Yi-Ching Yao,et al. Asymptotics when the number of parameters tends to infinity in the Bradley-Terry model for paired comparisons , 1999 .

[19] D. Hunter,et al. Optimization Transfer Using Surrogate Objective Functions , 2000 .

[20] D. Hunter. MM algorithms for generalized Bradley-Terry models , 2003 .

[21] D. Hunter,et al. A Tutorial on MM Algorithms , 2004 .

[22] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23] Chih-Jen Lin,et al. Generalized Bradley-Terry Models and Multi-Class Probability Estimates , 2006, J. Mach. Learn. Res..

[24] Tom Minka,et al. TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[25] Chih-Jen Lin,et al. Ranking individuals by group comparisons , 2006, ICML.

[26] Quoc V. Le,et al. Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[27] Y. Nesterov. Gradient methods for minimizing composite objective function , 2007 .

[28] John Guiver,et al. Bayesian inference for Plackett-Luce ranking models , 2009, ICML '09.

[29] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.

[30] Arnaud Doucet,et al. Efficient Bayesian Inference for Generalized Bradley–Terry Models , 2010, 1011.1761.

[31] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[32] Hang Li. Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[33] Alan Agresti,et al. Categorical Data Analysis , 2003 .

[34] David Firth,et al. Bradley-Terry Models in R: The BradleyTerry2 Package , 2012 .

[35] Nebojsa Jojic,et al. Efficient Ranking from Pairwise Comparisons , 2013, ICML.

[36] Bruce E. Hajek,et al. Minimax-optimal Inference from Partial Rankings , 2014, NIPS.

[37] Arun Rajkumar,et al. A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data , 2014, ICML.

[38] Matthias Grossglauser,et al. Fast and Accurate Inference of Plackett-Luce Models , 2015, NIPS.

[39] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[40] Yuxin Chen,et al. Spectral MLE: Top-K Rank Aggregation from Pairwise Comparisons , 2015, ICML.

[41] Julien Mairal,et al. Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[42] Milan Vojnovic,et al. Parameter Estimation for Generalized Thurstone Choice Models , 2016, ICML.

[43] Vivek S. Borkar,et al. Randomized Kaczmarz for rank aggregation from pairwise comparisons , 2016, 2016 IEEE Information Theory Workshop (ITW).

[44] Ashish Khetan,et al. Computational and Statistical Tradeoffs in Learning to Rank , 2016, NIPS.

[45] Ashish Khetan,et al. Data-driven Rank Breaking for Efficient Rank Aggregation , 2016, J. Mach. Learn. Res..

[46] Martin J. Wainwright,et al. Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence , 2015, J. Mach. Learn. Res..

[47] Devavrat Shah,et al. Rank Centrality: Ranking from Pairwise Comparisons , 2012, Oper. Res..

[48] Arpit Agarwal,et al. Accelerated Spectral Ranking , 2018, ICML.

[49] Thore Graepel,et al. Re-evaluating evaluation , 2018, NeurIPS.

[50] Milan Vojnovic,et al. Convergence Rates of Gradient Descent and MM Algorithms for Bradley-Terry Models , 2020, AISTATS.