A parallel solver for generalised additive models

An implementation of the backfitting algorithm for generalised additive models which is suitable for parallel computing is described. This implementation is designed to handle large data sets such as those occurring in data mining with several millions of observations on several hundreds of variables. For such large data sets it is crucial to have a fast, parallel implementation for fitting generalised additive models to allow an exploratory analysis of the data within a reasonable time. The approach used divides the data into several blocks (groups) and fits a (generalised) additive model to each block. These models are then merged to a single, final model. It is shown that this approach is very efficient as it allows the algorithm to adapt to the structure of the parallel computer (number of processors and amount of internal memory).

[1]  Anne Greenbaum,et al.  Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[2]  Oliver Linton,et al.  Miscellanea Efficient estimation of additive nonparametric regression models , 1997 .

[3]  Theo Gasser,et al.  Finite-Sample Variance of Local Polynomials: Analysis and Solutions , 1996 .

[4]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[5]  B. Silverman,et al.  Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[6]  Jianqing Fan,et al.  Fast Implementations of Nonparametric Curve Estimators , 1994 .

[7]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[8]  C. J. Stone,et al.  The Dimensionality Reduction Principle for Generalized Additive Models , 1986 .

[9]  H. Müller Nonparametric regression analysis of longitudinal data , 1988 .

[10]  R. Tibshirani,et al.  Linear Smoothers and Additive Models , 1989 .

[11]  Matt P. Wand,et al.  On the Accuracy of Binned Kernel Density Estimators , 1994 .

[12]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[13]  P. Hall,et al.  On the backfitting algorithm for additive regression models , 1993 .

[14]  Interpolation methods for adapting to sparse design in nonparametric regression. Comment. Rejoinder , 1997 .

[15]  B. Silverman,et al.  Algorithm AS 176: Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[16]  William F. Eddy Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface , 1981 .

[17]  O. Linton,et al.  A kernel method of estimating structured nonparametric regression based on marginal integration , 1995 .

[18]  David Ruppert,et al.  Fitting a Bivariate Additive Model by Local Polynomial Regression , 1997 .

[19]  M. Wand,et al.  Accuracy of Binned Kernel Functional Approximations , 1995 .

[20]  J. Ortega Numerical Analysis: A Second Course , 1974 .

[21]  Stephen Roberts,et al.  Finite element thin plate splines for data mining applications , 1998 .

[22]  David Ruppert,et al.  A Fully Automated Bandwidth Selection Method for Fitting Additive Models , 1998 .

[23]  David W. Scott,et al.  Using Computer-Binned Data for Density Estimation , 1981 .

[24]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[25]  Graham F. Carey,et al.  Book reviewComputational techniques and applications, CTAC-83: J. Noye and C. Fletcher, eds. (North-Holland, Amsterdam, 1984), 982 pp., ISBN 0 444 875271 , 1985 .

[26]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[27]  Methodology for nonparametric regression from independent sources , 1997 .

[28]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[29]  B. Yandell,et al.  Automatic Smoothing of Regression Functions in Generalized Linear Models , 1986 .

[30]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[31]  R. Tapia,et al.  Nonparametric Function Estimation, Modeling, and Simulation , 1987 .

[32]  Berwin A. Turlach,et al.  Interpolation Methods for Adapting to Sparse Design in Nonparametric Regression , 1997 .

[33]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[34]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[35]  David W. Scott,et al.  Smoothing by weighted averaging of rounded points , 1990 .

[36]  Prabir Burman,et al.  Estimation of generalized additive models , 1990 .

[37]  On combining independent nonparametric regression estimators , 1996 .

[38]  W. Härdle,et al.  Estimation of additive regression models with known links , 1996 .

[39]  A. Bowman,et al.  Applied smoothing techniques for data analysis : the kernel approach with S-plus illustrations , 1999 .

[40]  Stephen Roberts,et al.  Finite element thin plate splines for surface fitting , 1997 .

[41]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[42]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[43]  Robert Kohn,et al.  Convergence of the backfitting algorithm for additive models , 1994 .