Parallelisation of sparse grids for large scale data analysis

Sparse grids are the basis for efficient high dimensional approximation and have recently been applied successfully to predictive modelling. They are spanned by a collection of simpler function spaces represented by regular grids. The sparse grid combination technique prescribes how approximations on a collection of anisotropic grids can be combined to approximate high dimensional functions. In this paper we study the parallelisation of fitting data onto a sparse grid. The computation can be done entirely by fitting partial models on a collection of regular grids. This allows parallelism over the collection of grids. In addition, each of the partial grid fits can be parallelised as well, both in the assembly phase, where parallelism is done over the data, and in the solution stage using traditional parallel solvers for the resulting PDEs. Using a simple timing model we confirm that the most effective methods are obtained when both types of parallelism are used.

[1]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[2]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[3]  Arndt Bode,et al.  Parallel Computer Architectures: Theory, Hardware, Software, Applications , 1993, Parallel Computer Architectures.

[4]  Michael Griebel,et al.  Classification with sparse grids using simplicial basis functions , 2002, Intell. Data Anal..

[5]  Michael Griebel,et al.  On the Parallelization of the Sparse Grid Approach for Data Mining , 2001, LSSC.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[8]  M. Hegland Adaptive sparse grids , 2003 .

[9]  G. Wahba Spline models for observational data , 1990 .

[10]  Larry L. Schumaker,et al.  Curve and Surface Fitting: Saint-Malo 1999 , 2000 .

[11]  Jochen Garcke,et al.  Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten dGittern , 2004 .

[12]  Zuowei Shen,et al.  Multidimensional smoothing using hyperbolic interpolatory wavelets. , 2004 .

[13]  J. Friedman Multivariate adaptive regression splines , 1990 .

[14]  T. Störtkuhl,et al.  On the Parallel Solution of 3D PDEs on a Network of Workstations and on Vector Computers , 1993 .

[15]  Michael Griebel,et al.  Data Mining with Sparse Grids , 2001, Computing.

[16]  Eric R. Ziegel,et al.  Mastering Data Mining , 2001, Technometrics.

[17]  Denis J. Dean,et al.  Comparison of neural networks and discriminant analysis in predicting forest cover types , 1998 .

[18]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[19]  Michael Griebel,et al.  The Combination Technique for the Sparse Grid Solution of PDE's on Multiprocessor Machines , 1992, Parallel Process. Lett..