Robust entropy estimation strategies based on edge weighted random graphs

In this paper we treat the problem of robust entropy estimation given a multidimensional random sample from an unknown distribution. In particular, we consider estimation of the Renyi entropy of fractional order which is insensitive to outliers, e.g. high variance contaminating distributions, using the k-point minimal spanning tree. A greedy algorithm for approximating the NP-hard problem of computing the k-minimal spanning tree is given which is a generalization of the potential function partitioning method of Ravi et al. The basis for our approach is asymptotic theorem establishing that the log of the overall length or weight of the greedy approximation is a strongly consistent estimator of the Renyi entropy. Quantitative robustness of the estimator to outliers is established using Hampel's method of influence functions. The structure of the influence function indicates that the k-MST is a natural extension of the 1D, (alpha) -trimmed mean for multi- dimensional data.