The functional mean-shift algorithm for mode hunting and clustering in infinite dimensions

We introduce the functional mean-shift algorithm, an iterative algorithm for estimating the local modes of a surrogate density from functional data. We show that the algorithm can be used for cluster analysis of functional data. We propose a test based on the bootstrap for the significance of the estimated local modes of the surrogate density. We present two applications of our methodology. In the first application, we demonstrate how the functional mean-shift algorithm can be used to perform spike sorting, i.e. cluster neural activity curves. In the second application, we use the functional mean-shift algorithm to distinguish between original and fake signatures.

[1]  Julien Jacques,et al.  Functional data clustering: a survey , 2013, Advances in Data Analysis and Classification.

[2]  Z. Q. John Lu,et al.  Nonparametric Functional Data Analysis: Theory And Practice , 2007, Technometrics.

[3]  A. Ambrosetti,et al.  A primer of nonlinear analysis , 1993 .

[4]  A. Rinaldo,et al.  Generalized density clustering , 2009, 0907.3454.

[5]  P. Hall,et al.  Defining probability density for a distribution of random functions , 2010, 1002.4931.

[6]  A. Cuevas,et al.  A plug-in approach to support estimation , 1997 .

[7]  B. Presnell,et al.  Nonparametric estimation of the mode of a distribution of random curves , 1998 .

[8]  H. Chernoff Estimation of the mode , 1964 .

[9]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[10]  D. Bosq Linear Processes in Function Spaces: Theory And Applications , 2000 .

[11]  Philippe Vieu,et al.  A note on density mode estimation , 1996 .

[12]  James Stephen Marron,et al.  Comparison of data-driven bandwith selectors , 1988 .

[13]  Gery Geenens A Nonparametric Functional Method for Signature Recognition , 2011 .

[14]  J. Faraway,et al.  Bootstrap choice of bandwidth for density estimation , 1990 .

[15]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[16]  Rebecca Nugent,et al.  Stability of density-based clustering , 2010, J. Mach. Learn. Res..

[17]  P. Vieu,et al.  Estimating Some Characteristics of the Conditional Distribution in Nonparametric Functional Models , 2006 .

[18]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[19]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[20]  Joseph P. Romano On weak convergence and optimality of kernel density estimates of the mode , 1988 .

[21]  D. Donoho,et al.  Geometrizing Rates of Convergence, III , 1991 .

[22]  Nicola Torelli,et al.  Clustering via nonparametric density estimation , 2007, Stat. Comput..

[23]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  T. Hastie,et al.  Principal Curves , 2007 .

[25]  Larry A. Wasserman,et al.  A conformal prediction approach to explore functional data , 2013, Annals of Mathematics and Artificial Intelligence.

[26]  J. E. Chac'on,et al.  Clusters and water flows: a novel approach to modal clustering through Morse theory , 2012, 1212.1384.

[27]  W. Stuetzle,et al.  A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density , 2010 .

[28]  Surajit Ray,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[29]  Leonidas J. Guibas,et al.  Persistence-based clustering in riemannian manifolds , 2011, SoCG '11.

[30]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[31]  Frédéric Ferraty,et al.  Estimation du mode dans un espace vectoriel semi-normé , 2004 .

[32]  W. Eddy Optimum Kernel Estimators of the Mode , 1980 .

[33]  Miguel Á. Carreira-Perpiñán,et al.  Fast nonparametric clustering with Gaussian blurring mean-shift , 2006, ICML.

[34]  Jochen Einbeck,et al.  Bandwidth Selection for Mean-shift based Unsupervised Learning Techniques: a Unified Approach via Self-coverage , 2011 .

[35]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[36]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[37]  A. Gammerman,et al.  On-line predictive linear regression , 2005, math/0511522.

[38]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Youness Aliyari Ghassabeh,et al.  On the convergence of the mean shift algorithm in the one-dimensional space , 2013, Pattern Recognit. Lett..

[40]  Bernard D. Flury,et al.  Estimation of Principal Points , 1993 .

[41]  José E. Chacón,et al.  A Population Background for Nonparametric Density-Based Clustering , 2014, 1408.1381.

[42]  Piotr Kokoszka,et al.  Inference for Functional Data with Applications , 2012 .

[43]  Nonparametric estimation of a surrogate density function in infinite-dimensional spaces , 2012 .

[44]  T. Duong,et al.  Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting , 2012, 1204.6160.

[45]  David Mason,et al.  On the Estimation of the Gradient Lines of a Density and the Consistency of the Mean-Shift Algorithm , 2016, J. Mach. Learn. Res..

[46]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[47]  Larry A. Wasserman,et al.  Non‐parametric inference for density modes , 2013, ArXiv.

[48]  Zhanyi Hu,et al.  A note on the convergence of the mean shift , 2007, Pattern Recognit..