Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables

This article studies the problem of providing diagnostics for high-dimensional functions when the input variables are known to be dependent. In such situations, commonly used diagnostics can place an unduly large emphasis on functional behavior that occurs in regions of very low probability. Instead, a generalized functional ANOVA decomposition provides a natural representation of the function in terms of low-order components. This article details a weighted functional ANOVA that controls for the effect of dependence between input variables. The construction involves high-dimensional functions as nuisance parameters and suggests a novel estimation scheme for it. The methodology is demonstrated in the context of machine learning in which the possibility of poor extrapolation makes it important to restrict attention to regions of high data density.

[1]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[2]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[3]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[4]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[5]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[6]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[7]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[8]  C. J. Stone,et al.  The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[9]  O. Linton,et al.  A kernel method of estimating structured nonparametric regression based on marginal integration , 1995 .

[10]  Charles B. Roosen Visualization and exploration of high-dimensional functions using the functional anova decomposition , 1996 .

[11]  Bernard W. Silverman,et al.  Functional Data Analysis , 1997 .

[12]  R. Pace,et al.  Sparse spatial autoregressions , 1997 .

[13]  Jianhua Z. Huang Projection estimation in multiple regression with application to functional ANOVA models , 1998 .

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Chong Gu Smoothing Spline Anova Models , 2002 .

[16]  M. Hegland Adaptive sparse grids , 2003 .

[17]  A. Owen THE DIMENSION DISTRIBUTION AND QUADRATURE TEST FUNCTIONS , 2003 .

[18]  Art B. Owen,et al.  Quasi-regression with shrinkage , 2003, Math. Comput. Simul..

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Giles Hooker Diagnosing extrapolation: tree-based density estimation , 2004, KDD '04.