论文信息 - Learning using Large Datasets

Learning using Large Datasets

This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of small-scale and large-scale learning problems. Small-scale learning problems are subject to the usual approximation– estimation tradeoff. Large-scale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in non-trivial ways. For instance, a mediocre optimization algorithms, stochastic gradient descent, is shown to perform very well on large-scale learning problems.

Léon Bottou | Olivier Bousquet | L. Bottou | O. Bousquet

[1] Alex Bijlsma. Annales de la Faculté des Sciences de Toulouse , .

[2] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3] Vladimir Vapnik,et al. Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[4] John E. Dennis,et al. Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[5] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[6] J. Stephen Judd,et al. On the complexity of loading shallow neural networks , 1988, J. Complex..

[7] Yann LeCun,et al. Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[8] Peter L. Bartlett,et al. The importance of convexity in learning with squared loss , 1998, COLT '96.

[9] Peter L. Bartlett,et al. The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.

[10] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[11] Noboru Murata,et al. A Statistical Study on On-line Learning , 1999 .

[12] P. Massart. Some applications of concentration inequalities to statistics , 2000 .

[13] Shahar Mendelson,et al. A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[14] O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .

[15] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.

[16] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[17] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .

[18] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[19] O. Bousquet. THEORY OF CLASSIFICATION: A SURVEY OF RECENT ADVANCES , 2004 .

[20] Ingo Steinwart,et al. Fast Rates for Support Vector Machines , 2005, COLT.

[21] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[22] P. Bartlett,et al. Empirical minimization , 2006 .

[23] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[24] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[25] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .

[26] Don R. Hush,et al. QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines , 2006, J. Mach. Learn. Res..

[27] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[28] Chih-Jen Lin,et al. Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.