Large Linear Classification When Data Cannot Fit in Memory

Recent advances in linear classification have shown that for applications such as document classification, the training process can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. Because data cannot fit in memory, many design considerations are very different from those for traditional algorithms. We discuss and compare with existing approaches that are able to handle data larger than memory. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.

[1]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[2]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[3]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[4]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[5]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[6]  Tong Zhang,et al.  On the Dual Formulation of Regularized Linear Systems with Convex Risks , 2002, Machine Learning.

[7]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[8]  Antonio Artés-Rodríguez,et al.  Double Chunking for Solving SVMs for Very Large Datasets , 2004 .

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Luca Zanni,et al.  On the working set selection in gradient projection-based decomposition techniques for support vector machines , 2005, Optim. Methods Softw..

[12]  G Kingsley Compression tools compared , 2005 .

[13]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[14]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[16]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[17]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[18]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[19]  Deepayan Chakrabarti,et al.  Contextual advertising by combining relevance with click feedback , 2008, WWW.

[20]  Chih-Jen Lin,et al.  A sequential dual method for large scale multi-class linear svms , 2008, KDD.

[21]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[22]  Zheng Chen,et al.  P-packSVM: Parallel Primal grAdient desCent Kernel SVM , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[23]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[24]  Suresh Venkatasubramanian,et al.  Streamed Learning: One-Pass SVMs , 2009, IJCAI.

[25]  Cho-Jui Hsieh,et al.  Large linear classification when data cannot fit in memory , 2010, KDD.

[26]  Shou-De Lin,et al.  Feature Engineering and Classifier Ensemble for KDD Cup 2010 , 2010, KDD 2010.

[27]  Sam Houston Department of Computer Science , 2010 .

[28]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[29]  Ping Li,et al.  b-Bit minwise hashing , 2009, WWW '10.

[30]  Dan Roth,et al.  Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[33]  Chia-Hua Ho,et al.  Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[34]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[35]  R. Memisevic Dual Optimization of Conditional Probability Models December 21 , 2006 , .