A Schema Generator for Collected Data from Wearable Devices for Reliable Data Ingestion

We are living in an era, where we have smart devices all around us, like smart phones, smart watches, and other devices. Data produced by these devices has a great importance, used either for providing services or for analyzing purpose. These devices produce data in large amount, with many different formats, which makes data ingestion a very challenging task, as this massive amount of data causes a tremendous problem in both structuring and storage of data. Furthermore, due to the different formats and unpredictable nature of devices, data validation is also required. Although, there are many different data ingestion tools available which serves the purpose of structuring the data, but validation of data is still a big challenge. Therefore, we proposed a Schema Generator which aims to auto generate a schema of the input file for a new device added to the system, and validates this schema against the input file using available ingestion tool. Additionally, the idea is to provide a user interface for creating a configuration file for ingestion tool, as creating it manually is a very strenuous process for new device data.

[1]  María S. Pérez-Hernández,et al.  KerA: Scalable Data Ingestion for Stream Processing , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[2]  Zhe Wang,et al.  Log Real-time Management Scheme Based on LEK , 2015 .

[3]  Ioannis Chatzigiannakis,et al.  Developing an IoT Smart City framework , 2013, IISA 2013.

[4]  Azzedine Boukerche,et al.  An agent based and biological inspired real-time intrusion detection and security model for computer network operations , 2007, Comput. Commun..

[5]  Mihui Kim,et al.  Log-Based Cloud Monitoring System for OpenStack , 2018, 2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService).

[6]  Kiyoharu Aizawa,et al.  Efficient retrieval of life log based on context and content , 2004, CARPE'04.

[7]  Andreea Matacuta,et al.  Big Data Analytics: Analysis of Features and Performance of Big Data Ingestion Tools , 2018 .

[8]  Sumitra Pundlik,et al.  Real Time Generalized Log File Management and Analysis using Pattern Matching and Dynamic Clustering , 2014 .

[9]  Minoru Uehara,et al.  Proposed Sensor Network for Living Environments Using Cloud Computing , 2012, 2012 15th International Conference on Network-Based Information Systems.

[10]  Tobias Pulls,et al.  Standardized Syslog Processing : Revisiting Secure Reliable Data Transfer and Message Compression , 2016 .

[11]  T. K. Das,et al.  BIG Data Analytics: A Framework for Unstructured Data Analysis , 2013 .

[12]  Pierfrancesco Bellini,et al.  Smart City Architecture for Data Ingestion and Analytics: Processes and Solutions , 2018, 2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService).