Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0

Data summarizations are a valuable tool to derive knowledge from large data streams and have proven their usefulness in a great number of applications. Summaries can be found by optimizing submodular functions. These functions map subsets of data to real values, which indicate their ”representativeness” and which should be maximized to find a diverse summary of the underlying data. In this paper, we studied Exemplar-based clustering as a submodular function and provide a GPU algorithm to cope with its high computational complexity. We show, that our GPU implementation provides speedups of up to 72x using singleprecision and up to 452x using half-precision computation compared to conventional CPU algorithms. We also show, that the GPU algorithm not only provides remarkable runtime benefits with workstation-grade GPUs but also with low-power embedded computation units for which speedups of up to 35x are possible. Furthermore, we apply our algorithm to realworld data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts. Beyond pure speedup considerations, we show, that our approach can provide summaries within reasonable time frames for this kind of industrial, real-world data.

[1]  Donald V. Rosato,et al.  Injection Molding Handbook , 1985 .

[2]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[3]  Sebastian Buschjäger,et al.  Very Fast Streaming Submodular Function Maximization , 2020, ECML/PKDD.

[4]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[5]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[6]  Laurence A. Wolsey,et al.  Best Algorithms for Approximating the Maximum of a Submodular Set Function , 1978, Math. Oper. Res..

[7]  Manish Marwah,et al.  Following the electrons: methods for power management in commercial buildings , 2012, KDD.

[8]  Russ B. Altman,et al.  CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms , 2011, Bioinform..

[9]  Miriam Leeser,et al.  Accelerating K-Means clustering with parallel implementations and GPU computing , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[10]  Jian Wang,et al.  A Novel Process Control Methodology Based on the PVT Behavior of Polymer for Injection Molding , 2013 .

[11]  Aristides Gionis,et al.  Event detection in activity networks , 2014, KDD.

[12]  Hui Lin,et al.  Graph-based submodular selection for extractive summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[13]  Reinhard Schiffers Anomaly detection in injection molding process data based on unsupervised learning , 2018 .

[14]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[15]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Reinhard Schiffers,et al.  Adaptive quality prediction in injection molding based on ensemble learning , 2021 .