Large-Scale Machine Learning with Stochastic Gradient Descent
暂无分享,去创建一个
[1] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .
[2] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[3] John E. Dennis,et al. Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.
[4] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[5] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[6] Bernard Widrow,et al. Adaptive switching circuits , 1988 .
[7] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[8] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[9] Peter L. Bartlett,et al. The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.
[10] Noboru Murata,et al. A Statistical Study on On-line Learning , 1999 .
[11] P. Massart. Some applications of concentration inequalities to statistics , 2000 .
[12] Sabine Buchholz,et al. Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.
[13] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[14] O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .
[15] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .
[16] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..
[17] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[18] Léon Bottou,et al. On-line learning for very large data sets , 2005 .
[19] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.
[20] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[21] Chih-Jen Lin,et al. Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.
[22] Nathan Srebro,et al. SVM optimization: inverse dependence on training set size , 2008, ICML '08.
[23] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[24] Wei Xu,et al. Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent , 2011, ArXiv.