Recent Advances of Large-Scale Linear Classification This paper is a survey on development of optimization methods to construct linear classifiers suitable for large-scale applications; for some data, accuracy is close to that of nonlinear classifiers.
暂无分享,去创建一个
[1] Gérard Dreyfus,et al. Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.
[2] Edward Y. Chang,et al. Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.
[3] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.
[4] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.
[5] Joshua Goodman,et al. Sequential Conditional Generalized Iterative Scaling , 2002, ACL.
[6] Stephen P. Boyd,et al. An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..
[7] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.
[8] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[9] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[10] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..
[11] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[12] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..
[13] Chih-Jen Lin,et al. Iterative Scaling and Coordinate Descent Methods for Maximum Entropy , 2009, ACL.
[14] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..
[15] B. Mercier,et al. A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .
[16] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[17] Gideon S. Mann,et al. Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.
[18] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[19] Chih-Jen Lin,et al. A sequential dual method for large scale multi-class linear svms , 2008, KDD.
[20] Rong Yan,et al. A Faster Iterative Scaling Algorithm for Conditional Exponential Model , 2003, ICML.
[21] Katya Scheinberg,et al. Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..
[22] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[23] Stephen J. Wright,et al. ASSET: Approximate Stochastic Subgradient Estimation Training for Support Vector Machines , 2012, ICPRAM.
[24] Dianne P. O'Leary,et al. Adaptive constraint reduction for training support vector machines. , 2008 .
[25] Nathan Ratliff,et al. Online) Subgradient Methods for Structured Prediction , 2007 .
[26] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[27] Chih-Jen Lin,et al. Generalized Bradley-Terry Models and Multi-Class Probability Estimates , 2006, J. Mach. Learn. Res..
[28] Jiawei Han,et al. Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.
[29] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.
[30] I. Daubechies,et al. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.
[31] Carsten Wiuf,et al. Bounded coordinate-descent for biological sequence classification in high dimensional predictor space , 2010, KDD.
[32] Ping Li,et al. Hashing Algorithms for Large-Scale Learning , 2011, NIPS.
[33] Alexander J. Smola,et al. Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..
[34] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[35] Kenneth Ward Church,et al. Very sparse random projections , 2006, KDD '06.
[36] Stephen J. Wright,et al. Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..
[37] Peter L. Bartlett,et al. Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..
[38] Jieping Ye,et al. Large-scale sparse logistic regression , 2009, KDD.
[39] Olvi L. Mangasarian,et al. A finite newton method for classification , 2002, Optim. Methods Softw..
[40] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.
[41] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.
[42] Michael I. Jordan,et al. Predictive low-rank decomposition for kernel methods , 2005, ICML.
[43] Kilian Q. Weinberger,et al. Feature hashing for large scale multitask learning , 2009, ICML '09.
[44] Chih-Jen Lin,et al. Large linear classification when data cannot fit in memory , 2010, KDD '10.
[45] S. Sathiya Keerthi,et al. A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..
[46] Joachim M. Buhmann,et al. Kernel Expansion for Online Preference Tracking , 2008, ISMIR.
[47] Dimitris Achlioptas,et al. Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..
[48] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[49] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .
[50] Dan Roth,et al. Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.
[51] Jason Weston,et al. Multi-Class Support Vector Machines , 1998 .
[52] R. Tibshirani,et al. PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.
[53] John Langford,et al. Slow Learners are Fast , 2009, NIPS.
[54] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[55] Glenn Fung,et al. A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..
[56] Michael C. Ferris,et al. Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..
[57] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[58] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..
[59] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .
[60] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[61] H. Robbins. A Stochastic Approximation Method , 1951 .
[62] Yaakov Tsaig,et al. Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.
[63] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[64] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..
[65] Isabelle Guyon,et al. Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).
[66] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.
[67] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.
[68] E. M. Gertz,et al. Support vector machine classifiers for large data sets. , 2006 .
[69] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[70] Sören Sonnenburg,et al. COFFIN: A Computational Framework for Linear SVMs , 2010, ICML.
[71] Ming-Syan Chen,et al. Efficient Kernel Approximation for Large-Scale Support Vector Machine Classification , 2011, SDM.
[72] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.
[73] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[74] Chih-Jen Lin,et al. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..
[75] Chih-Jen Lin,et al. Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.
[76] Masashi Sugiyama,et al. Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparse Learning , 2009 .
[77] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[78] G. Wahba,et al. Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .
[79] O. Mangasarian,et al. Massive data discrimination via linear support vector machines , 2000 .
[80] Jianfeng Gao,et al. Scalable training of L1-regularized log-linear models , 2007, ICML '07.
[81] David Madigan,et al. Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.
[82] P. Tseng,et al. On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .
[83] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.
[84] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.
[85] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..
[86] Kim-Chuan Toh,et al. A coordinate gradient descent method for ℓ1-regularized convex minimization , 2011, Comput. Optim. Appl..
[87] Yuh-Jye Lee,et al. RSVM: Reduced Support Vector Machines , 2001, SDM.
[88] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[89] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[90] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[91] Rob Malouf,et al. A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.
[92] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .
[93] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..
[94] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[95] Deepayan Chakrabarti,et al. Contextual advertising by combining relevance with click feedback , 2008, WWW.
[96] Chih-Jen Lin,et al. Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.
[97] Stephen P. Boyd,et al. An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.
[98] Chih-Jen Lin,et al. A formal analysis of stopping criteria of decomposition methods for support vector machines , 2002, IEEE Trans. Neural Networks.
[99] J. E. Kelley,et al. The Cutting-Plane Method for Solving Convex Programs , 1960 .
[100] Sören Sonnenburg,et al. Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.
[101] T. Minka. A comparison of numerical optimizers for logistic regression , 2004 .
[102] Wotao Yin,et al. A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..
[103] Chih-Jen Lin,et al. A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..
[104] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[105] Vivek S. Borkar,et al. Distributed Asynchronous Incremental Subgradient Methods , 2001 .
[106] Georgios B. Giannakis,et al. Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..
[107] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[108] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[109] J. S. Cramer. The Origins of Logistic Regression , 2002 .
[110] Zheng Chen,et al. P-packSVM: Parallel Primal grAdient desCent Kernel SVM , 2009, 2009 Ninth IEEE International Conference on Data Mining.
[111] Jason Weston,et al. Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..
[112] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.
[113] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..
[114] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.
[115] Mário A. T. Figueiredo,et al. Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.
[116] John Langford,et al. Sparse Online Learning via Truncated Gradient , 2008, NIPS.
[117] Gerhard Weikum,et al. Fast logistic regression for text categorization with variable-length n-grams , 2008, KDD.
[118] Chih-Jen Lin,et al. Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.
[119] Nello Cristianini,et al. Large Margin DAGs for Multiclass Classification , 1999, NIPS.