High Dimensional Smoothing Based on Multilevel Analysis

A fundamental issue in Data Mining is the development of algorithms to extract useful information from very large databases. One important technique is to estimate a smooth function approximating the data. Such an approximation can for example be used for visualisation, prediction, or classification purposes. However, the number of observations can be of the order of millions and there may be hundreds of variables recorded so one has to deal with the so-called ”curse of dimensionality”. The algorithmic complexity of this process is typically of the order γ3d−2 where γ is the number of grid points in each dimension and d is the number of dimensions. We propose a method for approximating a high dimensional surface by computing a projection onto multi-level spaces of low density and we demonstrate that the algorithmic complexity of this method is proportional to (jd−1(2j+1 − 1))3, where j = ⌊log2 γ⌋ — a substantial reduction in computational work. In addition, we show that the approximation error is proportional to (j+d−1 d−1 ) 2−2j , with the proportionality constant depending on the smoothness of the computed surface.

[1]  Truong Q. Nguyen,et al.  Wavelets and filter banks , 1996 .

[2]  Stephen Roberts,et al.  Finite element thin plate splines for data mining applications , 1998 .

[3]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[4]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[5]  Markus Hegland,et al.  Parallel Performance of Fast Wavelet Transforms , 2000, Int. J. High Speed Comput..

[6]  Peter Strazdins,et al.  Accelerated methods for performing the LDLT decomposition , 2000 .

[7]  Zuowei Shen,et al.  Multivariate Compactly Supported Fundamental Refinable Functions, Duals, and Biorthogonal Wavelets , 1999 .

[8]  Joos Vandewalle,et al.  Wavelet Based Modeling of Nonlinear Systems , 1998 .

[9]  Gene H. Golub,et al.  Matrix computations , 1983 .

[10]  Zuowei Shen,et al.  Interpolatory Wavelet Packets , 2000 .

[11]  Y. Meyer,et al.  Wavelets and Filter Banks , 1991 .

[12]  R. DeVore,et al.  Hyperbolic Wavelet Approximation , 1998 .

[13]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[14]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[15]  David L. Donoho,et al.  Interpolating Wavelet Transforms , 1992 .

[16]  C. D. Boor,et al.  Box splines , 1993 .

[17]  F. Sprengel Interpolation and Wavelets on Sparse Gau?-Chebyshev Grids , 1996 .

[18]  Michael Griebel,et al.  Adaptive Sparse Grids for Hyperbolic Conservation Laws , 1999 .

[19]  Jean Duchon,et al.  Splines minimizing rotation-invariant semi-norms in Sobolev spaces , 1976, Constructive Theory of Functions of Several Variables.

[20]  Ole Møller Nielsen,et al.  Wavelets in scientific computing , 1998 .