Classification with sparse grids using simplicial basis functions

Recently we presented a new approach [20] to the classification problem arising in data mining. It is based on the regularization network approach but in contrast to other methods, which employ ansatz functions associated to data points, we use a grid in the usually high-dimensional feature space for the minimization process. To cope with the curse of dimensionality, we employ sparse grids [52]. Thus, only O(h_n^{-1} n^{d-1}) instead of O(h_n^{-d}) grid points and unknowns are involved. Here d denotes the dimension of the feature space and h_n = 2^{-n} gives the mesh size. We use the sparse grid combination technique [30] where the classification problem is discretized and solved on a sequence of conventional grids with uniform mesh sizes in each dimension. The sparse grid solution is then obtained by linear combination. The method computes a nonlinear classifier but scales only linearly with the number of data points and is well suited for data mining applications where the amount of data is very large, but where the dimension of the feature space is moderately high. In contrast to our former work, where d-linear functions were used, we now apply linear basis functions based on a simplicial discretization. This allows to handle more dimensions and the algorithm needs less operations per data point. We further extend the method to so-called anisotropic sparse grids, where now different a-priori chosen mesh sizes can be used for the discretization of each attribute. This can improve the run time of the method and the approximation results in the case of data sets with different importance of the attributes. We describe the sparse grid combination technique for the classification problem, give implementational details and discuss the complexity of the algorithm. It turns out that the method scales linearly with the number of given data points. Finally we report on the quality of the classifier built by our new method on data sets with up to 14 dimensions. We show that our new method achieves correctness rates which are competitive to those of the best existing methods.

[1]  G. Baszenski n-th Order Polynomial Spline Blending , 1985 .

[2]  Michael Griebel,et al.  A combination technique for the solution of sparse grid problems , 1990, Forschungsberichte, TU Munich.

[3]  Michael Griebel,et al.  The Combination Technique for the Sparse Grid Solution of PDE's on Multiprocessor Machines , 1992, Parallel Process. Lett..

[4]  Karin Frank,et al.  Information Complexity of Multivariate Fredholm Integral Equations in Sobolev Classes , 1996, J. Complex..

[5]  M. Griebel,et al.  On the computation of the eigenproblems of hydrogen helium in strong magnetic and electric fields with the sparse grid combination technique , 2000 .

[6]  H. Freudenthal Simplizialzerlegungen von Beschrankter Flachheit , 1942 .

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[11]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[12]  Harold W. Kuhn,et al.  Some Combinatorial Lemmas in Topology , 1960, IBM J. Res. Dev..

[13]  Hans-Joachim Bungartz,et al.  Dünne Gitter und deren Anwendung bei der adaptiven Lösung der dreidimensionalen Poisson-Gleichung , 1992 .

[14]  Michael Griebel,et al.  On the Parallelization of the Sparse Grid Approach for Data Mining , 2001, LSSC.

[15]  M. Griebel,et al.  Optimized Tensor-Product Approximation Spaces , 2000 .

[16]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[17]  T. Störtkuhl,et al.  On the Parallel Solution of 3D PDEs on a Network of Workstations and on Vector Computers , 1993, Parallel Computer Architectures.

[18]  V. N. Temli︠a︡kov Approximation of functions with bounded mixed derivative , 1989 .

[19]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[20]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[21]  Michael Griebel,et al.  Data Mining with Sparse Grids , 2001, Computing.

[22]  Hans-Joachim Bungartz,et al.  Pointwise Convergence Of The Combination Technique For Laplace's Equation , 1994 .

[23]  Sameer Singh,et al.  2D spiral pattern recognition with possibilistic measures , 1998, Pattern Recognit. Lett..

[24]  Thomas Gerstner,et al.  Numerical integration using sparse grids , 2004, Numerical Algorithms.

[25]  W. Sickel,et al.  Interpolation on Sparse Grids and Tensor Products of Nikol'skij–Besov Spaces , 1999 .

[26]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[27]  Michael Griebel,et al.  Sparse grids for boundary integral equations , 1999, Numerische Mathematik.

[28]  J. R. Wallis,et al.  Some ecological consequences of a computer model of forest growth , 1972 .

[29]  William D. Penny,et al.  Bayesian neural networks for classification: how useful is the evidence framework? , 1999, Neural Networks.

[30]  E. Arge,et al.  Approximation of scattered data using smooth grid functions , 1995 .

[31]  Eric R. Ziegel,et al.  Mastering Data Mining , 2001, Technometrics.

[32]  S. Odewahn,et al.  Automated star/galaxy discrimination with neural networks , 1992 .

[33]  Michael Griebel,et al.  The efficient solution of fluid dynamics problems by the combination technique , 1995, Forschungsberichte, TU Munich.

[34]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[35]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[36]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.

[37]  F. Utreras Cross-validation techniques for smoothing spline functions in one or two dimensions , 1979 .

[38]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[39]  G. Wahba Spline models for observational data , 1990 .