论文信息 - Reuse-Centric K-Means Configuration

Reuse-Centric K-Means Configuration

K-means configuration is a time-consuming process due to the iterative nature of k-means. This paper proposes reuse-centric k-means configuration to accelerate k-means configuration. It is based on the observation that the explorations of different configurations share lots of common or similar computations. Effectively reusing the computations from prior trials of different configurations could largely shorten the configuration time. The paper presents a set of novel techniques to materialize the idea, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, k, and feature sets. Experiments show that our approach can accelerate some common configuration tuning methods by 5-9X.

[1] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[2] Charles Elkan,et al. Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[3] Nikolai F. Rulkov,et al. Online Decorrelation of Humidity and Temperature in Chemical Sensors for Continuous Monitoring , 2016, ArXiv.

[4] David E. Irwin,et al. Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[5] Yue Zhao,et al. Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[6] Argyris Kalogeratos,et al. Dip-means: an incremental clustering method for estimating the number of clusters , 2012, NIPS.

[7] I-Cheng Yeh,et al. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[8] Holger H. Hoos,et al. Automated Algorithm Configuration and Parameter Tuning , 2012, Autonomous Search.

[9] Davide Anguita,et al. A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.