Declutter and Resample: Towards Parameter Free Denoising

In many data analysis applications the following scenario is commonplace: we are given a point set that is supposed to sample a hidden ground truth K in a metric space, but it got corrupted with noise so that some of the data points lie far away from K creating outliers also termed as ambient noise. One of the main goals of denoising algorithms is to eliminate such noise so that the curated data lie within a bounded Hausdorff distance of K. Popular denoising approaches such as deconvolution and thresholding often require the user to set several parameters and/or to choose an appropriate noise model while guaranteeing only asymptotic convergence. Our goal is to lighten this burden as much as possible while ensuring theoretical guarantees in all cases. Specifically, first, we propose a simple denoising algorithm that requires only a single parameter but provides a theoretical guarantee on the quality of the output on general input points. We argue that this single parameter cannot be avoided. We next present a simple algorithm that avoids even this parameter by paying for it with a slight strengthening of the sampling condition on the input points which is not unrealistic. We also provide some preliminary empirical evidence that our algorithms are effective in practice.

[1]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[2]  Samory Kpotufe,et al.  Modal-set estimation with an application to clustering , 2016, AISTATS.

[3]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[4]  L. Devroye,et al.  A weighted k-nearest neighbor density estimate for geometric inference , 2011 .

[5]  Steve Oudot,et al.  Towards persistence-based reconstruction in euclidean spaces , 2007, SCG '08.

[6]  Frédéric Chazal,et al.  Deconvolution for the Wasserstein Metric and Geometric Inference , 2011, GSI.

[7]  Leonidas J. Guibas,et al.  Manifold Reconstruction in Arbitrary Dimensions Using Witness Complexes , 2007, SCG '07.

[8]  Michael Dinitz,et al.  Spanners with Slack , 2006, ESA.

[9]  Frédéric Chazal,et al.  Geometric Inference for Probability Measures , 2011, Found. Comput. Math..

[10]  Mickaël Buchet,et al.  Topological inference from measures , 2014 .

[11]  James R. Munkres,et al.  Elements of algebraic topology , 1984 .

[12]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Tamal K. Dey,et al.  Shape Dimension and Approximation from Samples , 2002, SODA '02.

[14]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[15]  Marshall W. Bern,et al.  Surface Reconstruction by Voronoi Filtering , 1998, SCG '98.

[16]  L. Wasserman,et al.  On the path density of a gradient field , 2008, 0805.4141.

[17]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2005, Discret. Comput. Geom..

[18]  Steve Oudot,et al.  Topological Analysis of Scalar Fields with Outliers , 2015, Symposium on Computational Geometry.

[19]  Steve Oudot,et al.  Efficient and robust persistent homology for measures , 2013, Comput. Geom..

[20]  K. Egiazarian,et al.  Blind image deconvolution , 2007 .

[21]  Leonidas J. Guibas,et al.  Witnessed k-Distance , 2013, Discret. Comput. Geom..

[22]  Deniz Erdogmus,et al.  Locally Defined Principal Curves and Surfaces , 2011, J. Mach. Learn. Res..

[23]  Ji Zhang,et al.  Advancements of Outlier Detection: A Survey , 2013, EAI Endorsed Trans. Scalable Inf. Syst..

[24]  Frédéric Chazal,et al.  A Sampling Theory for Compact Sets in Euclidean Space , 2006, SCG '06.

[25]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[26]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[27]  Tamal K. Dey,et al.  Parameter-free Topology Inference and Sparsification for Data on Manifolds , 2017, SODA.

[28]  R. Ho Algebraic Topology , 2022 .

[29]  A. Meister Deconvolution Problems in Nonparametric Statistics , 2009 .

[30]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.