Approximating $(k,\ell)$-Median Clustering for Polygonal Curves

In 2015, Driemel, Krivošija and Sohler introduced the (k, `)-median problem for clustering polygonal curves under the Fréchet distance. Given a set of input curves, the problem asks to find k median curves of at most ` vertices each that minimize the sum of Fréchet distances over all input curves to their closest median curve. A major shortcoming of their algorithm is that the input curves are restricted to lie on the real line. In this paper, we present a randomized bicriteria approximation algorithm that works for polygonal curves in R and achieves approximation factor (1 + ε) with respect to the clustering costs. The algorithm has worst-case running-time linear in the number of curves, polynomial in the maximum number of vertices per curve, i.e. their complexity, and exponential in ε and δ, i.e., the failure probability. We achieve this result through a shortcutting lemma, which guarantees the existence of a polygonal curve with similar cost as an optimal median curve of complexity `, but of complexity at most 2`− 2, and whose vertices can be computed efficiently. We combine this lemma with the superset-sampling technique by Kumar et al. to derive our clustering result. In doing so, we describe and analyze a generalization of the algorithm by Ackermann et al., which may be of independent interest. ∗Faculty of Mathematics, Ruhr-University Bochum, Germany, maike.buchin@rub.de †Hausdorff Center for Mathematics, University of Bonn, Germany, driemel@cs.uni-bonn.de ‡Faculty of Mathematics, Ruhr-University Bochum, Germany, dennis.rohde-t1b@rub.de ar X iv :2 00 9. 01 48 8v 2 [ cs .C G ] 4 S ep 2 02 0

[1]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[2]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[3]  Helmut Alt,et al.  Computing the Fréchet distance between two polygonal curves , 1995, Int. J. Comput. Geom. Appl..

[4]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[5]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[6]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[7]  Alexander Munteanu,et al.  Random projections and sampling algorithms for clustering of high-dimensional polygonal curves , 2019, NeurIPS.

[8]  Luis Angel García-Escudero,et al.  A Proposal for Robust Curve Clustering , 2005, J. Classif..

[9]  Joachim Gudmundsson,et al.  Approximating $(k,\ell)$-center clustering for curves , 2018 .

[10]  Kevin Buchin,et al.  Computing the Fréchet distance between simple polygons , 2008, Comput. Geom..

[11]  Ioannis Psarros,et al.  The VC Dimension of Metric Balls under Fréchet and Hausdorff Distances , 2019, Discrete & Computational Geometry.

[12]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[13]  Kevin Buchin,et al.  On the hardness of computing an average curve , 2020, SWAT.

[14]  Marcel R. Ackermann,et al.  Clustering for metric and non-metric distance measures , 2008, SODA '08.

[15]  Jeng-Min Chiou,et al.  Functional clustering and identifying substructures of longitudinal data , 2007 .

[16]  M. Iri,et al.  Polygonal Approximations of a Curve — Formulations and Algorithms , 1988 .

[17]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[18]  Jian Li,et al.  Epsilon-Coresets for Clustering (with Outliers) in Doubling Metrics , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[19]  Alexandr Andoni,et al.  High-Dimensional Computational Geometry , 2016, Handbook of Big Data.

[20]  Pierre Gançarski,et al.  Summarizing a set of time series by averaging: From Steiner sequence to compact multiple alignment , 2012, Theor. Comput. Sci..

[21]  Nabil H. Mustafa,et al.  Near-Linear Time Approximation Algorithms for Curve Simplification , 2005, Algorithmica.

[22]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[23]  Christian Sohler,et al.  Clustering time series under the Fréchet distance , 2015, SODA.

[24]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[25]  Sariel Har-Peled,et al.  Jaywalking Your Dog: Computing the Fréchet Distance with Shortcuts , 2012, SIAM J. Comput..

[26]  Abhinandan Nath,et al.  k-Median clustering under discrete Fréchet and Hausdorff distances , 2020, SoCG.

[27]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[28]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..