Classification and Annotation of Open Internet of Things Datastreams

The Internet of Things (IoT) is springboarding novel applications and has led to the generation of massive amounts of data that can offer valuable insights across multiple domains: Smart Cities, environmental monitoring, healthcare etc. In particular, the availability of open IoT data streaming from heterogeneous sources constitute a novel powerful knowledge base. However, due to the inherent distributed, heterogeneous and open nature of such data, metadata that describe the data is generally lacking. This happens especially in contexts where IoT data is contributed by users via cloud-based open data platforms, in which even the information about the type of data measured is often missing. Since metadata is of paramount importance for data reuse, there is a need to develop intelligent techniques that can perform automatic annotation of heterogeneous IoT datastreams. In this paper, we propose two novel IoT datastream classification algorithms: CBOS and TKSE for the task of metadata annotation. We validate our proposed techniques through extensive experiments using public IoT datasets and comparing the outcomes with state-of-the-art classification methods. Results show that our techniques bring significant improvements to classification accuracy.

[1]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[2]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[3]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[4]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[5]  Thiago H. Silva,et al.  Sensing in the Collaborative Internet of Things , 2015, Sensors.

[6]  Thanassis Tiropanis,et al.  TritanDB: Time-series Rapid Internet of Things Analytics , 2018, ArXiv.

[7]  Patrick Schäfer The BOSS is concerned with time series classification in the presence of noise , 2014, Data Mining and Knowledge Discovery.

[8]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[9]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[10]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[11]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[12]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[13]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[14]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[15]  Luciano Bononi,et al.  A Collaborative Internet of Things Architecture for Smart Cities and Environmental Monitoring , 2018, IEEE Internet of Things Journal.

[16]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[17]  Jessica Lin,et al.  Linear Time Complexity Time Series Classification with Bag-of-Pattern-Features , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[18]  Karl Aberer,et al.  Deriving Semantic Sensor Metadata from Raw Measurements , 2012, SSN.

[19]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..