Least Squares Revisited: Scalable Approaches for Multi-class Prediction
暂无分享,去创建一个
Le Song | Nikos Karampatziakis | Sham M. Kakade | Gregory Valiant | Alekh Agarwal | S. Kakade | Nikos Karampatziakis | Le Song | Alekh Agarwal | G. Valiant
[1] Jorge Nocedal,et al. On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..
[2] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[3] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[4] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..
[5] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .
[6] Peter Richtárik,et al. Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.
[7] Adam Tauman Kalai,et al. Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.
[8] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.
[9] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[10] Anna Choromanska,et al. Majorization for CRFs and Latent Likelihoods , 2012, NIPS.
[11] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[12] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[13] Alexander J. Smola,et al. Linear support vector machines via dual cached loops , 2012, KDD.
[14] Alekh Agarwal,et al. Selective sampling algorithms for cost-sensitive multiclass prediction , 2013, ICML.
[15] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[16] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[17] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[18] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[19] Michael Collins,et al. Discriminative Reranking for Natural Language Parsing , 2000, CL.
[20] E. Bronshtein. ε-Entropy of convex sets and functions , 1976 .
[21] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .
[22] Adam Tauman Kalai,et al. The Isotron Algorithm: High-Dimensional Isotonic Regression , 2009, COLT.
[23] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[24] Chih-Jen Lin,et al. Large Linear Classification When Data Cannot Fit in Memory , 2011, TKDD.
[25] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[26] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.
[27] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[28] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[29] R. Rockafellar. Characterization of the subdifferentials of convex functions , 1966 .
[30] Olivier Chapelle,et al. Training a Support Vector Machine in the Primal , 2007, Neural Computation.
[31] Michael I. Jordan. Graphical Models , 2003 .