论文信息 - Ultra Fast Medoid Identification via Correlated Sequential Halving

Ultra Fast Medoid Identification via Correlated Sequential Halving

The medoid of a set of n points is the point in the set that minimizes the sum of distances to other points. It can be determined exactly in O(n^2) time by computing the distances between all pairs of points. Previous works show that one can significantly reduce the number of distance computations needed by adaptively querying distances. The resulting randomized algorithm is obtained by a direct conversion of the computation problem to a multi-armed bandit statistical inference problem. In this work, we show that we can better exploit the structure of the underlying computation problem by modifying the traditional bandit sampling strategy and using it in conjunction with a suitably chosen multi-armed bandit algorithm. Four to five orders of magnitude gains over exact computation are obtained on real data, in terms of both number of distance computations needed and wall clock time. Theoretical results are obtained to quantify such gains in terms of data parameters. Our code is publicly available online at this https URL.

Tavor Z. Baharav | David N. Tse | David Tse

[1] Din J. Wasem,et al. Mining of Massive Datasets , 2014 .

[2] David Eppstein,et al. Fast approximation of centrality , 2000, SODA '01.

[3] Richard G. Baraniuk,et al. Adaptive Estimation for Approximate k-Nearest-Neighbor Computations , 2019, AISTATS.

[4] James Bennett,et al. The Netflix Prize , 2007 .

[5] David Tse,et al. Medoids in almost linear time via multi-armed bandits , 2017, AISTATS.

[6] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[7] François Fleuret,et al. A Sub-Quadratic Exact Medoid Algorithm , 2017, AISTATS.

[8] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[9] Lior Pachter,et al. Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts , 2016, Genome Biology.

[10] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[11] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[12] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[13] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[14] David Tse,et al. Adaptive Monte-Carlo Optimization , 2018, ArXiv.

[15] David Tse,et al. Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits , 2019, ICML.

[16] Xiang-Yang Li,et al. Ranking of Closeness Centrality for Large-Scale Social Networks , 2008, FAW.

[17] Peter J. Rousseeuw,et al. Clustering by means of medoids , 1987 .