论文信息 - Recent Advances of Large-Scale Linear Classification This paper is a survey on development of optimization methods to construct linear classifiers suitable for large-scale applications; for some data, accuracy is close to that of nonlinear classifiers.

Recent Advances of Large-Scale Linear Classification This paper is a survey on development of optimization methods to construct linear classifiers suitable for large-scale applications; for some data, accuracy is close to that of nonlinear classifiers.

Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classi- fiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much faster. Recently, many research works have developed efficient optimization methods to construct linear classifiers and ap- plied them to some large-scale applications. In this paper, we give a comprehensive survey on the recent development of this active research area.

[1] Gérard Dreyfus,et al. Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[2] Edward Y. Chang,et al. Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[3] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[4] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[5] Joshua Goodman,et al. Sequential Conditional Generalized Iterative Scaling , 2002, ACL.

[6] Stephen P. Boyd,et al. An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[7] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[8] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[9] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[10] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[11] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[12] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[13] Chih-Jen Lin,et al. Iterative Scaling and Coordinate Descent Methods for Maximum Entropy , 2009, ACL.

[14] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[15] B. Mercier,et al. A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[16] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[17] Gideon S. Mann,et al. Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[18] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[19] Chih-Jen Lin,et al. A sequential dual method for large scale multi-class linear svms , 2008, KDD.

[20] Rong Yan,et al. A Faster Iterative Scaling Algorithm for Conditional Exponential Model , 2003, ICML.

[21] Katya Scheinberg,et al. Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[22] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[23] Stephen J. Wright,et al. ASSET: Approximate Stochastic Subgradient Estimation Training for Support Vector Machines , 2012, ICPRAM.

[24] Dianne P. O'Leary,et al. Adaptive constraint reduction for training support vector machines. , 2008 .

[25] Nathan Ratliff,et al. Online) Subgradient Methods for Structured Prediction , 2007 .

[26] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[27] Chih-Jen Lin,et al. Generalized Bradley-Terry Models and Multi-Class Probability Estimates , 2006, J. Mach. Learn. Res..

[28] Jiawei Han,et al. Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[29] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[30] I. Daubechies,et al. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[31] Carsten Wiuf,et al. Bounded coordinate-descent for biological sequence classification in high dimensional predictor space , 2010, KDD.

[32] Ping Li,et al. Hashing Algorithms for Large-Scale Learning , 2011, NIPS.

[33] Alexander J. Smola,et al. Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[34] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[35] Kenneth Ward Church,et al. Very sparse random projections , 2006, KDD '06.

[36] Stephen J. Wright,et al. Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[37] Peter L. Bartlett,et al. Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..

[38] Jieping Ye,et al. Large-scale sparse logistic regression , 2009, KDD.

[39] Olvi L. Mangasarian,et al. A finite newton method for classification , 2002, Optim. Methods Softw..

[40] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[41] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[42] Michael I. Jordan,et al. Predictive low-rank decomposition for kernel methods , 2005, ICML.

[43] Kilian Q. Weinberger,et al. Feature hashing for large scale multitask learning , 2009, ICML '09.

[44] Chih-Jen Lin,et al. Large linear classification when data cannot fit in memory , 2010, KDD '10.

[45] S. Sathiya Keerthi,et al. A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[46] Joachim M. Buhmann,et al. Kernel Expansion for Online Preference Tracking , 2008, ISMIR.

[47] Dimitris Achlioptas,et al. Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[48] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[49] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[50] Dan Roth,et al. Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.

[51] Jason Weston,et al. Multi-Class Support Vector Machines , 1998 .

[52] R. Tibshirani,et al. PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[53] John Langford,et al. Slow Learners are Fast , 2009, NIPS.

[54] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[55] Glenn Fung,et al. A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[56] Michael C. Ferris,et al. Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[57] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[58] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[59] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[60] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[61] H. Robbins. A Stochastic Approximation Method , 1951 .

[62] Yaakov Tsaig,et al. Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.

[63] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[64] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[65] Isabelle Guyon,et al. Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[66] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[67] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[68] E. M. Gertz,et al. Support vector machine classifiers for large data sets. , 2006 .

[69] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[70] Sören Sonnenburg,et al. COFFIN: A Computational Framework for Linear SVMs , 2010, ICML.

[71] Ming-Syan Chen,et al. Efficient Kernel Approximation for Large-Scale Support Vector Machine Classification , 2011, SDM.

[72] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[73] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[74] Chih-Jen Lin,et al. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[75] Chih-Jen Lin,et al. Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[76] Masashi Sugiyama,et al. Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparse Learning , 2009 .

[77] Vladimir Vapnik,et al. Statistical learning theory , 1998 .