A Hierarchical Storage System for Industrial Time-Series Data

The increasing interest among manufacturers in monitoring and analyzing industrial systems is generating a problem related to the considerable costs associated with the storage of captured data. This paper presents a three-level hierarchical architecture for time-series data storage on cloud environments that helps to decrease those costs. In the first level, new raw time-series data is stored for a short-period of time (e.g., one day) on electronic non-volatile storage such as solid-state drives (SSDs) that provide fast access for real time visualization of the latest data. In the second level, recent time series are stored for a medium-period of time (e.g., one week) on magnetic hard disk drives (HDDs) that are lower-cost devices with slower data transfer speed. In the third level, a reduced representation of the time series obtained by applying time-series reduction techniques are stored in HDDs, for a longer period of time (e.g., one year). Dealing with those reduced representations, data storage and transmission costs can be decreased, without limiting the future use of the data in different processes.The architecture has been implemented by using the top Database Management System from three different categories: Wide column store, Time series DBMS and Graph DBMS. It has been tested using industrial time series coming from a real manufacturing environment, and with three different types of queries proposed by domain experts. The performance results regarding storage space, storage costs and query time processing are shown on the paper.

[1]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[2]  張正儀,et al.  基於Google Cloud Platform設計高效能日誌分析平台之研究 , 2017 .

[3]  Torben Bach Pedersen,et al.  Time Series Management Systems: A Survey , 2017, IEEE Transactions on Knowledge and Data Engineering.

[4]  Arantza Illarramendi,et al.  I4TSRS: A System to Assist a Data Engineer in Time-Series Dimensionality Reduction in Industry 4.0 Scenarios , 2018, CIKM.

[5]  Shufen Zhang,et al.  Analysis and Research of Cloud Computing System Instance , 2010, 2010 Second International Conference on Future Networks.

[6]  Tomasz Wiktor Wlodarczyk Overview of Time Series Storage and Processing in a Cloud Environment , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[7]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[8]  Athanasios V. Vasilakos,et al.  A Manufacturing Big Data Solution for Active Preventive Maintenance , 2017, IEEE Transactions on Industrial Informatics.

[9]  Luca Deri,et al.  tsdb: A Compressed Database for Time Series , 2012, TMA.

[10]  Mouzhi Ge,et al.  Big Data for Internet of Things: A Survey , 2018, Future Gener. Comput. Syst..

[11]  Dmitry Namiot,et al.  Time Series Databases , 2015, DAMDID/RCDL.

[12]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[13]  Heiko Koziolek,et al.  Scalability and Robustness of Time-Series Databases for Cloud-Native Monitoring of Industrial Processes , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[14]  Deep Ganguli,et al.  Druid: a real-time analytical data store , 2014, SIGMOD Conference.

[15]  Andrew Kusiak,et al.  Smart manufacturing must embrace big data , 2017, Nature.

[16]  Vinay Sudhakaran,et al.  A comprehensive evaluation of NoSQL datastores in the context of historians and sensor data analysis , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[17]  V. Martinez,et al.  THE FUTURE OF SERVITIZATION : Technologies that will make a difference , 2015 .

[18]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[19]  Oliver Kopp,et al.  Survey and Comparison of Open Source Time Series Databases , 2017, BTW.

[20]  Arantza Illarramendi,et al.  I4TSPS: a Visual-Interactive Web System for Industrial Time-Series Pre-processing , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[21]  Husnu S. Narman,et al.  Popularity-Aware Multi-Failure Resilient and Cost-Effective Replication for High Data Durability in Cloud Storage , 2019, IEEE Transactions on Parallel and Distributed Systems.

[22]  T. Lawson,et al.  Spark , 2011 .

[23]  Dave Evans,et al.  How the Next Evolution of the Internet Is Changing Everything , 2011 .

[24]  Anthony Rowe,et al.  Specialized Storage for Big Numeric Time Series , 2013, HotStorage.