Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach

In high dimensional data modeling, Multivariate Adaptive Regression Splines (MARS) is a popular nonparametric regression technique used to define the nonlinear relationship between a response variable and the predictors with the help of splines. MARS uses piecewise linear functions for local fit and apply an adaptive procedure to select the number and location of breaking points (called knots). The function estimation is basically generated via a two-stepwise procedure: forward selection and backward elimination. In the first step, a large number of local fits is obtained by selecting large number of knots via a lack-of-fit criteria; and in the latter one, the least contributing local fits or knots are removed. In conventional adaptive spline procedure, knots are selected from a set of all distinct data points that makes the forward selection procedure computationally expensive and leads to high local variance. To avoid this drawback, it is possible to restrict the knot points to a subset of data points. In this context, a new method is proposed for knot selection which bases on a mapping approach like self organizing maps. By this method, less but more representative data points are become eligible to be used as knots for function estimation in forward step of MARS. The proposed method is applied to many simulated and real datasets, and the results show that it proposes a time efficient forward step for the knot selection and model estimation without degrading the model accuracy and prediction performance.

[1]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[2]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[3]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[4]  T. Simpson,et al.  Comparative studies of metamodelling techniques under multiple modelling criteria , 2001 .

[5]  Trevor Hastie,et al.  Polynomial splines and their tensor products in extended linear modeling. Discussion and rejoinder , 1997 .

[6]  Xiaotong Shen,et al.  Free-knot Splines and Adaptive Knot Selection , 2005 .

[7]  T. Ekman,et al.  Nonlinear prediction of mobile radio channels: measurements and MARS model designs , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  G. Weber,et al.  CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization , 2012 .

[9]  Ellis L. Johnson,et al.  Solving for an optimal airline yield management policy via statistical learning , 2003 .

[10]  Christine A. Shoemaker,et al.  Applying Experimental Design and Regression Splines to High-Dimensional Continuous-State Stochastic Dynamic Programming , 1999, Oper. Res..

[11]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[12]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[13]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[14]  Jay M. Rosenberger,et al.  Global optimization for a piecewise linear regression spline function , 2011 .

[15]  C. L. Mallows Some comments on C_p , 1973 .

[16]  T. Simpson,et al.  Comparative studies of metamodeling techniques under multiple modeling criteria , 2000 .

[17]  G. Wahba,et al.  Hybrid Adaptive Splines , 1997 .

[18]  J. Ramsay Monotone Regression Splines in Action , 1988 .

[19]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[20]  Robert Kohn,et al.  A Bayesian approach to additive semiparametric regression , 1996 .

[21]  T. Hastie,et al.  Using multivariate adaptive regression splines to predict the distributions of New Zealand ’ s freshwater diadromous fish , 2005 .

[22]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[23]  Magne Aldrin,et al.  Improved predictions penalizing both slope and curvature in additive models , 2006, Comput. Stat. Data Anal..

[24]  Wataru Sakamoto,et al.  MARS: selecting basis functions and knots with an empirical Bayes method , 2007, Comput. Stat..

[25]  M. B. Beck,et al.  Stochastic Dynamic Programming Formulation for a Wastewater Treatment Decision-Making Framework , 2004, Ann. Oper. Res..

[26]  P. Bowen,et al.  Changes in portlandite morphology with solvent composition: Atomistic simulations and experiment , 2011 .

[27]  Grace Wahba [Monotone Regression Splines in Action]: Comment , 1988 .

[28]  Jean Dickinson Gibbons,et al.  Nonparametric Statistical Inference , 1972, International Encyclopedia of Statistical Science.

[29]  J. Friedman,et al.  FLEXIBLE PARSIMONIOUS SMOOTHING AND ADDITIVE MODELING , 1989 .

[30]  V. Chen Measuring the Goodness of Orthogonal Array Discretizations for High-Dimensional Continuous-State Stochastic Dynamic Programs , 2001 .

[31]  Chester W. Washburne,et al.  A Discussion of , 1920 .

[32]  Edwin J C G van den Oord,et al.  Multivariate adaptive regression splines: a powerful method for detecting disease–risk relationship differences among subgroups , 2006, Statistics in medicine.

[33]  U. Ligges Review of An R and S-PLUS companion to applied regression by J. Fox, Sage Publications, Thousand Oaks, California 2002 , 2003 .

[34]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Second Edition , 1988, Springer Series in Information Sciences.

[35]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[36]  Jay M. Rosenberger,et al.  A statistical computer experiments approach to airline fleet assignment , 2008 .

[37]  Donald E. Brown,et al.  Global Optimization With Multivariate Adaptive Regression Splines , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  T. Hastie,et al.  Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions , 2006 .

[39]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[40]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[41]  Adrian F. M. Smith,et al.  Automatic Bayesian curve fitting , 1998 .

[42]  Max A. Little,et al.  Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests , 2009 .

[43]  Taskin Atilgan,et al.  Selection of dimension and basis for density estimation and selection of dimension, basis and error distribution for regression , 1996 .

[44]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[45]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[46]  Jay M. Rosenberger,et al.  A multivariate adaptive regression splines cutting plane approach for solving a two-stage stochastic programming fleet assignment model , 2012, Eur. J. Oper. Res..

[47]  Cristiano Cervellera,et al.  Neural network and regression spline value function approximations for stochastic dynamic programming , 2007, Comput. Oper. Res..

[48]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[49]  Young K. Truong,et al.  Polynomial splines and their tensor products in extended linear modeling: 1994 Wald memorial lecture , 1997 .

[50]  Tian-Shyug Lee,et al.  Mining the customer credit using classification and regression tree and multivariate adaptive regression splines , 2006, Comput. Stat. Data Anal..

[51]  Gints Jekabsons,et al.  Adaptive Regression Splines toolbox for Matlab/Octave , 2015 .

[52]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[53]  Charles J. Stone,et al.  Additive Splines in Statistics , 2015 .

[54]  Dominique Haughton,et al.  Application of multiple adaptive regression splines (MARS) in direct response modeling , 2002 .

[55]  Angelika Foerster An R And S Plus Companion To Applied Regression , 2016 .