Event clustering & event series characterization on expected frequency

We present an efficient clustering algorithm applicable to one-dimensional data such as e.g. a series of times-tamps. Given an expected frequency ΔT<sup>−1</sup>, we introduce an O(N)-efficient method of characterizing N events represented by an ordered series of timestamps t<inf>1</inf>, t<inf>2</inf>,…, t<inf>N</inf>. In practice, the method proves useful to e.g. identify time intervals of missing data or to locate isolated events. Moreover, we define measures to quantify a series of events by varying ΔT to e.g. determine the quality of an Internet of Things service.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  Tao Zhang,et al.  Fog and IoT: An Overview of Research Opportunities , 2016, IEEE Internet of Things Journal.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Sue Ellen Haupt,et al.  A demonstration of coupled receptor/dispersion modeling with a genetic algorithm , 2004 .

[5]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[6]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[7]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[8]  Ingo Wegener The Worst Case Complexity of McDiarmid and Reed's Variant of BOTTOM-UP HEAPSORT is less than nlog n + 1.1n , 1992, Inf. Comput..

[9]  Nigel Hinds,et al.  PAIRS: A scalable geo-spatial data analytics platform , 2015, IEEE BigData.

[10]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  Matthias Sax,et al.  Apache Kafka , 2019, Encyclopedia of Big Data Technologies.

[13]  Hendrik F. Hamann,et al.  IBM PAIRS curated big data service for accelerated geospatial data analytics and discovery , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[14]  Zhou Cheng,et al.  Overview of the Internet of Things , 2011 .

[15]  Silvia Ferrari,et al.  A Mobile Sensing Approach for Regional Surveillance of Fugitive Methane Emissions in Oil and Gas Production. , 2016, Environmental science & technology.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Nishant Garg Apache Kafka , 2013 .