A new approach to multivariate adaptive regression splines by using Tikhonov regularization and continuous optimization

This paper introduces a model-based approach to the important data mining tool Multivariate adaptive regression splines (MARS), which has originally been organized in a more model-free way. Indeed, MARS denotes a modern methodology from statistical learning which is important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. It is very useful for high-dimensional problems and shows a great promise for fitting nonlinear multivariate functions. The MARS algorithm for estimating the model function consists of two algorithms, these are the forward and the backward stepwise algorithm. In our paper, we propose not to use the backward stepwise algorithm. Instead, we construct a penalized residual sum of squares for MARS as a Tikhonov regularization problem which is also known as ridge regression. We treat this problem using continuous optimization techniques which we consider to become an important complementary technology and model-based alternative to the concept of the backward stepwise algorithm. In particular, we apply the elegant framework of conic quadratic programming. This is an area of convex optimization which is very well-structured, herewith, resembling linear programming and, hence, permitting the use of powerful interior point methods. Based on these theoretical and algorithmical studies, this paper also contains an application to diabetes data. We evaluate and compare the performance of the established MARS and our new CMARS in classifying diabetic persons, where CMARS turns out to be very competitive and promising.

[1]  M. Fowler,et al.  Clinical Practice Recommendations , 2009, Clinical Diabetes.

[2]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[3]  María J. Cánovas,et al.  Stability of systems of linear equations and inequalities: distance to ill-posedness and metric regularity , 2007 .

[4]  J. Dora,et al.  Standards of Medical Care in Diabetes—2008 , 2008, Diabetes Care.

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  G.-W. Weber,et al.  Optimization of gene-environment networks in the presence of errors and uncertainty with Chebychev approximation , 2008 .

[7]  Clifford H. Thurber,et al.  Parameter estimation and inverse problems , 2005 .

[8]  Gintautas Dzemyda,et al.  Dependence of locally linear embedding on the regularization parameter , 2010 .

[9]  J. Friedman Multivariate adaptive regression splines , 1990 .

[10]  S. Nash,et al.  Linear and Nonlinear Programming , 1987 .

[11]  David V. Power,et al.  Standards of Medical Care in Diabetes: Response to position statement of the American Diabetes Association , 2006 .

[12]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[13]  P. Taylan,et al.  New approaches to regression by generalized additive models and continuous optimization for modern applications in finance, science and technology , 2007 .

[14]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[15]  P. Hansen Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion , 1987 .

[16]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[17]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[18]  Aharon Ben-Tal,et al.  Lectures on modern convex optimization , 1987 .