Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach

We propose the use of Kernel Regularized Least Squares (KRLS) for social science modeling and inference problems. KRLS borrows from machine learning methods designed to solve regression and classification problems without relying on linearity or additivity assumptions. The method constructs a flexible hypothesis space that uses kernels as radial basis functions and finds the best-fitting surface in this space by minimizing a complexity-penalized least squares problem. We argue that the method is well-suited for social science inquiry because it avoids strong parametric assumptions, yet allows interpretation in ways analogous to generalized linear models while also permitting more complex interpretation to examine nonlinearities, interactions, and heterogeneous effects. We also extend the method in several directions to make it more effective for social inquiry, by (1) deriving estimators for the pointwise marginal effects and their variances, (2) establishing unbiasedness, consistency, and asymptotic normality of the KRLS estimator under fairly general conditions, (3) proposing a simple automated rule for choosing the kernel bandwidth, and (4) providing companion software. We illustrate the use of the method through simulations and empirical examples.

[1]  Thomas Brambor,et al.  Understanding Interaction Models: Improving Empirical Analyses , 2006, Political Analysis.

[2]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[3]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[4]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[5]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[6]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[7]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[8]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[9]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[10]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[11]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[12]  R. Rifkin,et al.  Notes on Regularized Least Squares , 2007 .

[13]  Lorenzo Rosasco,et al.  Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..

[14]  S. Wood Thin plate regression splines , 2003 .

[15]  R. Friedrich In Defense of Multiplicative Terms In Multiple Regression Equations , 1982 .

[16]  T. Poggio,et al.  Regularized Least-Squares Classification 133 In practice , although , 2007 .

[17]  Gary King,et al.  The Dangers of Extreme Counterfactuals , 2006, Political Analysis.

[18]  John E. Jackson Estimation of Models with Variable Coefficients , 1991, Political Analysis.

[19]  G. King,et al.  Improving Quantitative Studies of International Conflict: A Conjecture , 2000, American Political Science Review.

[20]  Barbara Harff No Lessons Learned from the Holocaust? Assessing Risks of Genocide and Political Mass Murder since 1955 , 2003, American Political Science Review.

[21]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.