Recent Advances of Large-Scale Linear Classification
暂无分享,去创建一个
[1] Alexander J. Smola,et al. Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..
[2] Zellig S. Harris,et al. Distributional Structure , 1954 .
[3] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[4] R. Tibshirani,et al. Regression shrinkage and selection via the lasso: a retrospective , 2011 .
[5] Dan Roth,et al. Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.
[6] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.
[7] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[8] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[9] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.
[10] Gérard Dreyfus,et al. Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.
[11] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[12] Chih-Jen Lin,et al. Iterative Scaling and Coordinate Descent Methods for Maximum Entropy , 2009, ACL.
[13] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.
[14] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[15] T. Joachims,et al. 1 Making Large-scale Svm Learning Practical , 1999 .
[16] Harish Karnick,et al. Random Feature Maps for Dot Product Kernels , 2012, AISTATS.
[17] Sören Sonnenburg,et al. Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.
[18] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.
[19] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..
[20] Stephen P. Boyd,et al. An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..
[21] R. Memisevic. Dual Optimization of Conditional Probability Models December 21 , 2006 , .
[22] T. Minka. A comparison of numerical optimizers for logistic regression , 2004 .
[23] Wotao Yin,et al. A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..
[24] Chih-Jen Lin,et al. A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..
[25] Ping Li,et al. b-Bit minwise hashing , 2009, WWW '10.
[26] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[27] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.
[28] Xiong Li,et al. Bundle CDN: A Highly Parallelized Approach for Large-Scale ℓ1-Regularized Logistic Regression , 2013, ECML/PKDD.
[29] Patrick Gallinari,et al. Erratum: SGDQN is Less Careful than Expected , 2010, J. Mach. Learn. Res..
[30] T. Steihaug. The Conjugate Gradient Method and Trust Regions in Large Scale Optimization , 1983 .
[31] John Langford. Vowpal Wabbit , 2014 .
[32] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[33] H. Robbins. A Stochastic Approximation Method , 1951 .
[34] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..
[35] Yaakov Tsaig,et al. Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.
[36] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[37] Yuh-Jye Lee,et al. RSVM: Reduced Support Vector Machines , 2001, SDM.
[38] J. S. Cramer. The Origins of Logistic Regression , 2002 .
[39] Stephen J. Wright,et al. Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..
[40] Zheng Chen,et al. P-packSVM: Parallel Primal grAdient desCent Kernel SVM , 2009, 2009 Ninth IEEE International Conference on Data Mining.
[41] R. Tibshirani,et al. PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.
[42] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[43] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.
[44] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[45] Joshua Goodman,et al. Sequential Conditional Generalized Iterative Scaling , 2002, ACL.
[46] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.
[47] Masashi Sugiyama,et al. Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation , 2009, J. Mach. Learn. Res..
[48] Ping Li,et al. Theory and applications of b-bit minwise hashing , 2011, Commun. ACM.
[49] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..
[50] Dimitris Achlioptas,et al. Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..
[51] Isabelle Guyon,et al. Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).
[52] Edward Y. Chang,et al. Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.
[53] Glenn Fung,et al. A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..
[54] Michael C. Ferris,et al. Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..
[55] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .
[56] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .
[57] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[58] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.
[59] E. M. Gertz,et al. Support vector machine classifiers for large data sets. , 2006 .
[60] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[61] David R. Musicant,et al. Successive overrelaxation for support vector machines , 1999, IEEE Trans. Neural Networks.
[62] Jason Weston,et al. Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..
[63] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[64] Gideon S. Mann,et al. Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.
[65] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[66] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .
[67] Jason Weston,et al. Multi-Class Support Vector Machines , 1998 .
[68] John Langford,et al. Slow Learners are Fast , 2009, NIPS.
[69] Jiawei Han,et al. Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.
[70] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.
[71] Rong Yan,et al. A Faster Iterative Scaling Algorithm for Conditional Exponential Model , 2003, ICML.
[72] Katya Scheinberg,et al. Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..
[73] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[74] Jieping Ye,et al. Large-scale sparse logistic regression , 2009, KDD.
[75] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.
[76] Andrzej Stachurski,et al. Parallel Optimization: Theory, Algorithms and Applications , 2000, Scalable Comput. Pract. Exp..
[77] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..
[78] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.
[79] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..
[80] Chih-Jen Lin,et al. Newton's Method for Large Bound-Constrained Optimization Problems , 1999, SIAM J. Optim..
[81] Mário A. T. Figueiredo,et al. Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.
[82] John Langford,et al. Sparse Online Learning via Truncated Gradient , 2008, NIPS.
[83] Gerhard Weikum,et al. Fast logistic regression for text categorization with variable-length n-grams , 2008, KDD.
[84] Chih-Jen Lin,et al. Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.
[85] Nello Cristianini,et al. The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines , 1998, ICML.
[86] Nello Cristianini,et al. Large Margin DAGs for Multiclass Classification , 1999, NIPS.
[87] B. Mercier,et al. A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .
[88] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[89] Cheng-Hao Tsai,et al. Large-scale logistic regression and linear support vector machines using spark , 2014, 2014 IEEE International Conference on Big Data (Big Data).
[90] Stephen J. Wright,et al. ASSET: Approximate Stochastic Subgradient Estimation Training for Support Vector Machines , 2012, ICPRAM.
[91] Chih-Jen Lin,et al. Iterative Scaling and Coordinate Descent Methods for Maximum Entropy , 2009, ACL/IJCNLP.
[92] Olvi L. Mangasarian,et al. A finite newton method for classification , 2002, Optim. Methods Softw..
[93] O. Mangasarian,et al. Massive data discrimination via linear support vector machines , 2000 .
[94] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.
[95] Michael I. Jordan,et al. Predictive low-rank decomposition for kernel methods , 2005, ICML.
[96] Jianfeng Gao,et al. Scalable training of L1-regularized log-linear models , 2007, ICML '07.
[97] David Madigan,et al. Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.
[98] John Langford,et al. Parallel Online Learning , 2011, ArXiv.
[99] Olvi L. Mangasarian,et al. Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..
[100] P. Tseng,et al. On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .
[101] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[102] Stephen J. Wright,et al. Approximate Stochastic Subgradient Estimation Training for Support Vector Machines , 2011, ArXiv.
[103] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[104] Rob Malouf,et al. A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.
[105] Vivek S. Borkar,et al. Distributed Asynchronous Incremental Subgradient Methods , 2001 .
[106] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.
[107] Peter Bühlmann. Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .
[108] A. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.
[109] Georgios B. Giannakis,et al. Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..
[110] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .
[111] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..
[112] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[113] Kenneth Ward Church,et al. Very sparse random projections , 2006, KDD '06.
[114] Kim-Chuan Toh,et al. A coordinate gradient descent method for ℓ1-regularized convex minimization , 2011, Comput. Optim. Appl..
[115] Stephen P. Boyd,et al. An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.
[116] Chih-Jen Lin,et al. A formal analysis of stopping criteria of decomposition methods for support vector machines , 2002, IEEE Trans. Neural Networks.
[117] Chih-Jen Lin,et al. Large Linear Classification When Data Cannot Fit in Memory , 2011, TKDD.
[118] J. E. Kelley,et al. The Cutting-Plane Method for Solving Convex Programs , 1960 .
[119] Chih-Jen Lin,et al. Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..
[120] Peter L. Bartlett,et al. Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..
[121] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.
[122] Kilian Q. Weinberger,et al. Feature hashing for large scale multitask learning , 2009, ICML '09.
[123] Chih-Jen Lin,et al. Large linear classification when data cannot fit in memory , 2010, KDD '10.
[124] S. Sathiya Keerthi,et al. A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..
[125] Chih-Jen Lin,et al. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..
[126] Chih-Jen Lin,et al. Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.
[127] Masashi Sugiyama,et al. Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparse Learning , 2009 .
[128] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.
[129] John Langford,et al. A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..
[130] Chih-Jen Lin,et al. A sequential dual method for large scale multi-class linear svms , 2008, KDD.
[131] H. J. Mclaughlin,et al. Learn , 2002 .
[132] Tom White,et al. Hadoop: The Definitive Guide , 2009 .
[133] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.
[134] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[135] Deepayan Chakrabarti,et al. Contextual advertising by combining relevance with click feedback , 2008, WWW.
[136] Chih-Jen Lin,et al. Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.
[137] Ping Li,et al. Hashing Algorithms for Large-Scale Learning , 2011, NIPS.
[138] Georgios B. Giannakis,et al. Consensus-based distributed linear support vector machines , 2010, IPSN '10.
[139] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[140] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.
[141] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[142] Sören Sonnenburg,et al. COFFIN: A Computational Framework for Linear SVMs , 2010, ICML.
[143] Rómer Rosales,et al. Simple and Scalable Response Prediction for Display Advertising , 2014, ACM Trans. Intell. Syst. Technol..
[144] Ming-Syan Chen,et al. Efficient Kernel Approximation for Large-Scale Support Vector Machine Classification , 2011, SDM.
[145] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.
[146] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[147] I. Daubechies,et al. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.
[148] Chih-Jen Lin,et al. Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..
[149] Chia-Hua Ho,et al. An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..
[150] Dianne P. O'Leary,et al. Adaptive constraint reduction for training support vector machines. , 2008 .
[151] Nathan Ratliff,et al. Online) Subgradient Methods for Structured Prediction , 2007 .
[152] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[153] Chih-Jen Lin,et al. Generalized Bradley-Terry Models and Multi-Class Probability Estimates , 2006, J. Mach. Learn. Res..
[154] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..
[155] Cho-Jui Hsieh,et al. Coordinate Descent Method for Large-scale L 2-loss Linear SVM , 2008 .
[156] Rasmus Pagh,et al. Fast and scalable polynomial kernels via explicit feature maps , 2013, KDD.
[157] Joachim M. Buhmann,et al. Kernel Expansion for Online Preference Tracking , 2008, ISMIR.
[158] Carsten Wiuf,et al. Bounded coordinate-descent for biological sequence classification in high dimensional predictor space , 2010, KDD.
[159] Chih-Jen Lin,et al. A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.
[160] G. Wahba,et al. Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .
[161] John Langford,et al. Hash Kernels , 2009, AISTATS.
[162] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[163] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..