论文信息 - Recent Advances of Large-Scale Linear Classification

Recent Advances of Large-Scale Linear Classification

Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much faster. Recently, many research works have developed efficient optimization methods to construct linear classifiers and applied them to some large-scale applications. In this paper, we give a comprehensive survey on the recent development of this active research area.

[1] Alexander J. Smola,et al. Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[2] Zellig S. Harris,et al. Distributional Structure , 1954 .

[3] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[4] R. Tibshirani,et al. Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[5] Dan Roth,et al. Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.

[6] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[7] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[8] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[10] Gérard Dreyfus,et al. Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[11] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[12] Chih-Jen Lin,et al. Iterative Scaling and Coordinate Descent Methods for Maximum Entropy , 2009, ACL.

[13] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[14] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.

[15] T. Joachims,et al. 1 Making Large-scale Svm Learning Practical , 1999 .

[16] Harish Karnick,et al. Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[17] Sören Sonnenburg,et al. Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.

[18] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[19] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[20] Stephen P. Boyd,et al. An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[21] R. Memisevic. Dual Optimization of Conditional Probability Models December 21 , 2006 , .

[22] T. Minka. A comparison of numerical optimizers for logistic regression , 2004 .

[23] Wotao Yin,et al. A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[24] Chih-Jen Lin,et al. A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[25] Ping Li,et al. b-Bit minwise hashing , 2009, WWW '10.

[26] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[27] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[28] Xiong Li,et al. Bundle CDN: A Highly Parallelized Approach for Large-Scale ℓ1-Regularized Logistic Regression , 2013, ECML/PKDD.

[29] Patrick Gallinari,et al. Erratum: SGDQN is Less Careful than Expected , 2010, J. Mach. Learn. Res..

[30] T. Steihaug. The Conjugate Gradient Method and Trust Regions in Large Scale Optimization , 1983 .

[31] John Langford. Vowpal Wabbit , 2014 .

[32] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[33] H. Robbins. A Stochastic Approximation Method , 1951 .

[34] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[35] Yaakov Tsaig,et al. Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.

[36] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[37] Yuh-Jye Lee,et al. RSVM: Reduced Support Vector Machines , 2001, SDM.

[38] J. S. Cramer. The Origins of Logistic Regression , 2002 .

[39] Stephen J. Wright,et al. Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[40] Zheng Chen,et al. P-packSVM: Parallel Primal grAdient desCent Kernel SVM , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[41] R. Tibshirani,et al. PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[42] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[43] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[44] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.

[45] Joshua Goodman,et al. Sequential Conditional Generalized Iterative Scaling , 2002, ACL.

[46] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[47] Masashi Sugiyama,et al. Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation , 2009, J. Mach. Learn. Res..

[48] Ping Li,et al. Theory and applications of b-bit minwise hashing , 2011, Commun. ACM.

[49] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[50] Dimitris Achlioptas,et al. Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[51] Isabelle Guyon,et al. Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[52] Edward Y. Chang,et al. Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[53] Glenn Fung,et al. A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[54] Michael C. Ferris,et al. Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[55] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[56] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .

[57] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[58] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[59] E. M. Gertz,et al. Support vector machine classifiers for large data sets. , 2006 .

[60] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[61] David R. Musicant,et al. Successive overrelaxation for support vector machines , 1999, IEEE Trans. Neural Networks.

[62] Jason Weston,et al. Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[63] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.