Using Support Vector Machines for Survey Research

Recent developments in machine learning allow for flexible functional form estimation beyond the approaches typically used by survey researchers and social scientists. Support vector machines (SVMs) are one such technique, commonly used for binary classification problems, such as whether or not an individual decides to participate in a survey. Since their inception, SVMs have been extended to solve categorical classification and regression problems. Their versatility in combination with the fact that they perform well in the presence of a large number of predictors, even with a small number of cases, makes them very appealing for a wide range of problems, including character recognition and text classification, speech and speaker verification, as well as imputation problems and record linkage. In this article, we provide a non-technical introduction to the main concepts of SVMs, discuss their advantages and disadvantages, present ideas as to how they can be used in survey research, and, finally, provide a hands-on example, including code, as to how they can be used in survey research and how the results compare to a traditional logistic regression.

[1]  David J. Curry,et al.  Prediction in Marketing Using the Support Vector Machine , 2005 .

[2]  Hyeran Byun,et al.  Applications of Support Vector Machines for Pattern Recognition: A Survey , 2002, SVM.

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[6]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[7]  Chao Lu,et al.  Application of SVM and Fuzzy Set Theory for Classifying with Incomplete Survey Data , 2007, 2007 International Conference on Service Systems and Service Management.

[8]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[9]  Theodore B. Trafalis,et al.  Support vector machines and the electoral college , 2003, Proceedings of ... International Joint Conference on Neural Networks.

[10]  David L. Olson,et al.  Comparative analysis of data mining methods for bankruptcy prediction , 2012, Decis. Support Syst..

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Peter Christen,et al.  Automatic record linkage using seeded nearest neighbour and support vector machine classification , 2008, KDD.

[13]  Paul Attewell,et al.  Data Mining for the Social Sciences , 2015 .