Neighborhood Graphs for Estimation of Density Functionals.

Functionals of densities play a fundamental role in statistics, signal processing, machine learning, information theory and related fields. This class of functionals includes entropy, divergence and mutual information measures of densities, intrinsic dimension of data embedded in manifolds, and minimum volume sets of densities. k-nearest neighbor (k-NN) graph based estimators are widely used for the estimation of these functionals. While several consistent k-NN estimators have been previously proposed for estimating these functionals, general results on rates of convergence of these estimators and confidence intervals on the estimated functional are not available. Since the rate of convergence relates the number of samples to the performance of the estimator, convergence rates have great practical utility. In this thesis, a new class of estimators based on bipartite k-nearest neighbor graphs is proposed for estimating functionals of probability density functions. This class includes entropy and divergence estimators, intrinsic dimension estimators and estimates of p-values for testing membership of data in minimum volume sets. For this class of estimators, large sample theory is used to characterize performance of the estimators. Specifically, large sample expressions for estimator bias and variance is derived and a central limit theorem for the distribution of the estimators is established. This theory is applied to accurately estimate functionals of interest by optimizing the mean squared error over free parameters, e.g. the number of neighbors k, and obtaining confidence intervals on the estimated functional by invoking the central limit theorem. Furthermore, this theory provides significant insight into the statistical

[1]  Csaba Szepesvári,et al.  Manifold-Adaptive Dimension Estimation , 2007, ICML '07.

[2]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[3]  Ramani Duraiswami,et al.  Fast optimal bandwidth selection for kernel density estimation , 2006, SDM.

[4]  Edward C. van der Meulen,et al.  Entropy-Based Tests of Uniformity , 1981 .

[5]  Evarist Giné,et al.  Uniform in Bandwidth Estimation of Integral Functionals of the Density Function , 2008 .

[6]  L. Devroye,et al.  A weighted k-nearest neighbor density estimate for geometric inference , 2011 .

[7]  John M. Lee Riemannian Manifolds: An Introduction to Curvature , 1997 .

[8]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[9]  Rohana J. Karunamuni,et al.  On boundary correction in kernel density estimation , 2005 .

[10]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[11]  M. C. Jones,et al.  Simple boundary correction for kernel density estimation , 1993 .

[12]  A. Hero,et al.  Empirical estimation of entropy functionals with confidence , 2010, 1012.4188.

[13]  Alfred O. Hero,et al.  On Local Intrinsic Dimension Estimation and Its Applications , 2010, IEEE Transactions on Signal Processing.

[14]  Bidyut Baran Chaudhuri,et al.  Texture Segmentation Using Fractal Dimension , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Alex Pentland,et al.  Fractal-Based Description of Natural Scenes , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[17]  Alfred O. Hero,et al.  k-nearest neighbor estimation of entropies with confidence , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[18]  Alfred O. Hero,et al.  Efficient anomaly detection using bipartite k-NN graphs , 2011, NIPS.

[19]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[20]  Neeraj Misra,et al.  Kn-nearest neighbor estimators of entropy , 2008 .

[21]  Alfred O. Hero,et al.  Boundary compensated k-NN graphs , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[22]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[23]  Alfred O. Hero,et al.  Geometric entropy minimization (GEM) for anomaly detection and localization , 2006, NIPS.

[24]  Alfred O. Hero,et al.  Ensemble Estimators for Multivariate Entropy Estimation , 2013, IEEE Transactions on Information Theory.

[25]  David L. Woodruff,et al.  Identification of Outliers in Multivariate Data , 1996 .

[26]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[27]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[28]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[29]  P. Massart,et al.  Estimation of Integral Functionals of a Density , 1995 .

[30]  Oldrich A Vasicek,et al.  A Test for Normality Based on Sample Entropy , 1976 .

[31]  Jean-Philippe Vert,et al.  Consistency and Convergence Rates of One-Class SVMs and Related Algorithms , 2006, J. Mach. Learn. Res..

[32]  Guillermo Sapiro,et al.  Morse description and geometric encoding of digital elevation maps , 2004, IEEE Transactions on Image Processing.

[33]  Alfred O. Hero,et al.  Weighted k-NN graphs for Rényi entropy estimation in high dimensions , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[34]  Anil K. Jain,et al.  Image data compression: A review , 1981, Proceedings of the IEEE.

[35]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[36]  Terence P. Speed,et al.  Comparison of Methods for Image Analysis on cDNA Microarray Data , 2002 .

[37]  Mathew D. Penrose,et al.  Gaussian limits for generalized spacings. , 2008, 0804.4123.

[38]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[39]  Alfred O. Hero,et al.  Applications of entropic spanning graphs , 2002, IEEE Signal Process. Mag..

[40]  Russell C. H. Cheng,et al.  Estimating Parameters in Continuous Univariate Distributions with a Shifted Origin , 1983 .

[41]  Jose A. Costa,et al.  Estimating Local Intrinsic Dimension with k-Nearest Neighbor Graphs , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[42]  Mathew D. Penrose A Strong Law for the Largest Nearest‐Neighbour Link between Random Points , 1999 .

[43]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[44]  Paul P. B. Eggermont,et al.  Best Asymptotic Normality of the Kernel Density Entropy Estimator for Smooth Densities , 1999, IEEE Trans. Inf. Theory.

[45]  Venkatesh Saligrama,et al.  Anomaly Detection with Score functions based on Nearest Neighbor Graphs , 2009, NIPS.

[46]  B. Laurent Efficient estimation of integral functionals of a density , 1996 .

[47]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[48]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[49]  A. Hero,et al.  Asymptotic Relations Between Minimal Graphs andfi-entropy , 2003 .

[50]  Robert D. Nowak,et al.  Adaptive Hausdorff Estimation of Density Level Sets , 2009, COLT.

[51]  Kai Ming Ting,et al.  Mass estimation and its applications , 2010, KDD.

[52]  M. Rosenblatt,et al.  Multivariate k-nearest neighbor density estimates , 1979 .

[53]  Masahiro Nakagawa,et al.  EEG-Based Classification of Motor Imagery Tasks Using Fractal Dimension and Neural Network for Brain-Computer Interface , 2008, IEICE Trans. Inf. Syst..

[54]  Marc M. Van Hulle,et al.  Edgeworth Approximation of Multivariate Differential Entropy , 2005, Neural Computation.

[55]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[56]  Riten Gupta,et al.  Quantization strategies for low-power communications. , 2001 .

[57]  John W. Fisher,et al.  ICA Using Spacings Estimates of Entropy , 2003, J. Mach. Learn. Res..

[58]  Barnabás Póczos,et al.  Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs , 2010, NIPS.

[59]  H. Chernoff,et al.  Central Limit Theorems for Interchangeable Processes , 1958, Canadian Journal of Mathematics.

[60]  Alfred O. Hero,et al.  Estimation of Nonlinear Functionals of Densities With Confidence , 2012, IEEE Transactions on Information Theory.

[61]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[62]  Alfred O. Hero,et al.  Using directed information for influence discovery in interconnected dynamical systems , 2008, Optical Engineering + Applications.

[63]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[64]  M. N. Goria,et al.  A new class of random vector entropy estimators and its applications in testing statistical hypotheses , 2005 .

[65]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[66]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[67]  Dafydd Evans A law of large numbers for nearest neighbour statistics , 2008, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[68]  Ibrahim A. Ahmad,et al.  A nonparametric estimation of the entropy for absolutely continuous distributions (Corresp.) , 1976, IEEE Trans. Inf. Theory.

[69]  Alfred O. Hero,et al.  Global performance prediction for divergence-based image registration criteria , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[70]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[71]  S. Chatterjee A NEW METHOD OF NORMAL APPROXIMATION , 2006, math/0611213.

[72]  Alfred O. Hero,et al.  Image registration in high-dimensional feature space , 2005, IS&T/SPIE Electronic Imaging.

[73]  Alfred O. Hero,et al.  Optimized intrinsic dimension estimator using nearest neighbor graphs , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[74]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[75]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[76]  J.S.Chitode DIGITAL COMMUNICATIONS , 2011 .

[77]  Robert D. Nowak,et al.  Learning Minimum Volume Sets , 2005, J. Mach. Learn. Res..

[78]  J. Yackel,et al.  Consistency Properties of Nearest Neighbor Density Function Estimators , 1977 .

[79]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[80]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[81]  Yu Ding,et al.  A Computable Plug-In Estimator of Minimum Volume Sets for Novelty Detection , 2010, Oper. Res..

[82]  I. Verdinelli,et al.  False Discovery Control for Random Fields , 2004 .

[83]  Amaury Lendasse,et al.  On the statistical estimation of Rényi entropies , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[84]  Brian Kent Aldershof,et al.  Estimation of integrated squared density derivatives , 1991 .