High-Dimensional Density Estimation via SCA: An Example in the Modelling of Hurricane Tracks ✩

We present nonparametric techniques for constructing and verifying density estimates from high-dimensional data whose irregular dependence structure cannot be modelled by parametric multivariate distributions. A low-dimensional representation of the data is critical in such situations because of the curse of dimensionality. Our proposed methodology consists of three main parts: (1) data reparameterization via dimensionality reduction, wherein the data are mapped into a space where standard techniques can be used for density estimation and simulation; (2) inverse mapping, in which simulated points are mapped back to the high-dimensional input space; and (3) verification, in which the quality of the estimate is assessed by comparing simulated samples with the observed data. These approaches are illustrated via an exploration of the spatial variability of tropical cyclones in the North Atlantic; each datum in this case is an entire hurricane trajectory. We conclude the paper with a discussion of extending the methods to model the relationship between TC variability and climatic variables.

[1]  Ann B. Lee,et al.  Treelets--An adaptive multi-scale basis for sparse unordered data , 2007, 0707.0481.

[2]  D. Zerom Godefay,et al.  On conditional density estimation , 2003 .

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Michael Greenacre,et al.  Book reviews: Correspondence analysis and data coding with Java and R. Fionn Murtagh. Chapman & Hall/CRC, 2005. Multidimensional nonlinear data analysis. Shizuhiko Nishiato. Chapman & Hall/CRC, 2006. , 2006 .

[5]  Nancy E. Heckman,et al.  Estimating and depicting the structure of a distribution of random functions , 2002 .

[6]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Lianfen Qian,et al.  Nonparametric Curve Estimation: Methods, Theory, and Applications , 1999, Technometrics.

[8]  Alexander G. Gray,et al.  Fast Nonparametric Conditional Density Estimation , 2007, UAI.

[9]  Ann B. Lee,et al.  EXPLOITING LOW-DIMENSIONAL STRUCTURE IN ASTRONOMICAL SPECTRA , 2008, 0807.2900.

[10]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[11]  Jeffrey S. Racine,et al.  Cross-Validation and the Estimation of Conditional Probability Densities , 2004 .

[12]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[13]  Rob J Hyndman,et al.  Estimating and Visualizing Conditional Densities , 1996 .

[14]  G. Vecchi,et al.  On Estimates of Historical North Atlantic Tropical Cyclone Activity , 2008 .

[15]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[16]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Larry A. Wasserman,et al.  Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo , 2007, AISTATS.

[18]  Rob J. Hyndman,et al.  Bandwidth selection for kernel conditional density estimation , 2001 .

[19]  Mark Girolami,et al.  Orthogonal Series Density Estimation and the Kernel Eigenvalue Problem , 2002, Neural Computation.

[20]  Lawrence A. Twisdale,et al.  SIMULATION OF HURRICANE RISK IN THE U.S. USING EMPIRICAL TRACK MODEL , 2000 .

[21]  P. Hall,et al.  Permutation tests for equality of distributions in high‐dimensional settings , 2002 .

[22]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[23]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[24]  R. Zamar,et al.  A multivariate Kolmogorov-Smirnov test of goodness of fit , 1997 .

[25]  Thomas M. Smith,et al.  Improvements to NOAA’s Historical Merged Land–Ocean Surface Temperature Analysis (1880–2006) , 2008 .

[26]  Sai Ravela,et al.  A STATISTICAL DETERMINISTIC APPROACH TO HURRICANE RISK ASSESSMENT , 2006 .

[27]  S. Jewson,et al.  Statistical modelling of North Atlantic tropical cyclone tracks , 2007 .

[28]  Ivor W. Tsang,et al.  The pre-image problem in kernel methods , 2003, IEEE Transactions on Neural Networks.

[29]  L. Pietrafesa,et al.  Climatology and Interannual Variability of North Atlantic Hurricane Tracks , 2005 .

[30]  Volker Schmidt,et al.  Stochastic modelling of tropical cyclone tracks , 2007, Math. Methods Oper. Res..

[31]  Sai Ravela,et al.  Supplement to A Statistical Deterministic Approach to Hurricane Risk Assessment , 2006 .

[32]  SST and North American Tropical Cyclone Landfall: A Statistical Modeling Study , 2008, 0801.1013.

[33]  Guillermo Sapiro,et al.  Connecting the Out-of-Sample and Pre-Image Problems in Kernel Methods , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..