SPARSE PREDICTIVE MODELING FOR BANK TELEMARKETING SUCCESS USING SMOOTH-THRESHOLD ESTIMATING EQUATIONS

In this paper, we attempt to build and evaluate several predictive models to predict success of telemarketing calls for selling bank long-term deposits using a publicly available set of data from a Portuguese retail bank collected from 2008 to 2013 (Moro et al., 2014, Decision Support Systems). The data include multiple predictor variables, either numeric or categorical, related with bank client, product and social-economic attributes. Dealing with a categorical predictor variable as multiple dummy variables increases model dimensionality, and redundancy in model parameterization must be of practical concern. This motivates us to assess prediction performance with more parsimonious modeling. We apply contemporary variable selection methods with penalization including lasso, elastic net, smoothly-clipped absolute deviation, minimum concave penalty as well as the smooth-threshold estimating equation. In addition to variable selection, the smooth-threshold estimating equation can achieve automatic grouping of predictor variables, which is an alternative sparse modeling to perform variable selection and could be suited to a certain problem, e.g., dummy variables created from categorical predictor variables. Predictive power of each modeling approach is assessed by repeating cross-validation experiments or sample splitting, one for training and another for testing.

[1]  Xiao-Hua Zhou,et al.  Variable selection and semiparametric efficient estimation for the heteroscedastic partially linear single-index model , 2014, Comput. Stat. Data Anal..

[2]  Hongzhe Li,et al.  A Sparse Structured Shrinkage Estimator for Nonparametric Varying-Coefficient Model With an Application in Genomics , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[3]  Masao Ueki,et al.  Automatic grouping using smooth-threshold estimating equations , 2011 .

[4]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[5]  Paulo Cortez,et al.  A data-driven approach to predict the success of bank telemarketing , 2014, Decis. Support Syst..

[6]  H. Bondell,et al.  Simultaneous Factor Selection and Collapsing Levels in ANOVA , 2009, Biometrics.

[7]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[8]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Jianqing Fan,et al.  Homogeneity Pursuit , 2015, Journal of the American Statistical Association.

[11]  Xiaotong Shen,et al.  Grouping Pursuit Through a Regularization Solution Surface , 2010, Journal of the American Statistical Association.

[12]  Heng Lian,et al.  Bias-corrected GEE estimation and smooth-threshold GEE variable selection for single-index models with clustered data , 2011, J. Multivar. Anal..

[13]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[14]  Masao Ueki,et al.  Multiple choice from competing regression models under multicollinearity based on standardized update , 2013, Comput. Stat. Data Anal..

[15]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[16]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[17]  Lixing Zhu,et al.  Automatic variable selection for longitudinal generalized linear models , 2013, Comput. Stat. Data Anal..

[18]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[19]  Jing Lv,et al.  An efficient and robust variable selection method for longitudinal generalized linear models , 2015, Comput. Stat. Data Anal..

[20]  Masao Ueki,et al.  A note on automatic variable selection using smooth-threshold estimating equations , 2009 .

[21]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[22]  W. Loh,et al.  Consistent Variable Selection in Linear Models , 1995 .

[23]  Anirut Suebsing,et al.  Feature Selection with Data Balancing for Prediction of Bank Telemarketing , 2014 .

[24]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .