Towards Automated Configuration of Stream Clustering Algorithms

Clustering is an important technique in data analysis which can reveal hidden patterns and unknown relationships in the data. A common problem in clustering is the proper choice of parameter settings. To tackle this, automated algorithm configuration is available which can automatically find the best parameter settings. In practice, however, many of our today’s data sources are data streams due to the widespread deployment of sensors, the internet-of-things or (social) media. Stream clustering aims to tackle this challenge by identifying, tracking and updating clusters over time. Unfortunately, none of the existing approaches for automated algorithm configuration are directly applicable to the streaming scenario. In this paper, we explore the possibility of automated algorithm configuration for stream clustering algorithms using an ensemble of different configurations. In first experiments, we demonstrate that our approach is able to automatically find superior configurations and refine them over time.

[1]  Matthias Carnein,et al.  An Empirical Comparison of Stream Clustering Algorithms , 2017, Conf. Computing Frontiers.

[2]  Geoff Holmes,et al.  Algorithm Selection on Data Streams , 2014, Discovery Science.

[3]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[4]  Matthias Carnein,et al.  Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms , 2019, Bus. Inf. Syst. Eng..

[5]  Geoff Holmes,et al.  Having a Blast: Meta-Learning and Heterogeneous Ensembles for Data Streams , 2015, 2015 IEEE International Conference on Data Mining.

[6]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[7]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[8]  Geoff Holmes,et al.  The online performance estimation framework: heterogeneous ensemble learning for data streams , 2017, Machine Learning.

[9]  Albert Bifet,et al.  MACHINE LEARNING FOR DATA STREAMS , 2018 .

[10]  Jean Paul Barddal,et al.  Adaptive random forests for data stream regression , 2018, ESANN.

[11]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[12]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[13]  Heike Trautmann,et al.  Automated Algorithm Selection: Survey and Perspectives , 2018, Evolutionary Computation.