Data Sparseness in Linear SVM

Large sparse datasets are common in many real-world applications. Linear SVM has been shown to be very efficient for classifying such datasets. However, it is still unknown how data sparseness would affect its convergence behavior. To study this problem in a systematic manner, we propose a novel approach to generate large and sparse data from real-world datasets, using statistical inference and the data sampling process in the PAC framework. We first study the convergence behavior of linear SVM experimentally, and make several observations, useful for real-world applications. We then offer theoretical proofs for our observations by studying the Bayes risk and PAC bound. Our experiment and theoretic results are valuable for learning large sparse datasets with linear SVM.

[1]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[2]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[3]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[4]  Wolfgang Nejdl,et al.  Proceedings of the 18th international conference on World wide web , 2009, WWW 2009.

[5]  Johan A. K. Suykens,et al.  Handling missing values in support vector machine classifiers , 2005, Neural Networks.

[6]  Robert Kleinberg Proceedings of the 4th conference on Innovations in Theoretical Computer Science , 2013, ITCS 2013.

[7]  Pieter Abbeel,et al.  Max-margin Classification of Data with Absent Features , 2008, J. Mach. Learn. Res..

[8]  D. Heitjan,et al.  Distinguishing “Missing at Random” and “Missing Completely at Random” , 1996 .

[9]  James M. Landwehr The American Statistician, Editor's Report for 1998 , 1999 .

[10]  Zoubin Ghahramani,et al.  Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices , 2014, ICML.

[11]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[13]  Rocco A. Servedio,et al.  Low-weight halfspaces for sparse boolean vectors , 2013, ITCS '13.

[14]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[15]  David G. Stork,et al.  Pattern Classification , 1973 .

[16]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[17]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[18]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[19]  James Bennett,et al.  The Netflix Prize , 2007 .

[20]  Aggelos Kiayias,et al.  Resource-based corruptions and the combinatorics of hidden diversity , 2013, ITCS '13.