Monitoring Data Integrity in Big Data Analytics Services

Enabled by advances in Cloud technologies, Big Data Analytics Services (BDAS) can improve many processes and identify extra information from previously untapped data sources. As our experience with BDAS and its benefits grows and technology for obtaining even more data improves, BDAS becomes ever more important for many different domains and for our daily lives. Most efforts in improving BDAS technologies have focused on scaling and efficiency issues. However, an equally important property is that of security, especially as we increasingly use public Cloud infrastructures instead of private ones. In this paper we present our approach for strengthening BDAS security by modifying the popular Spark infrastructure so as to monitor at run-time the integrity of data manipulated. In this way, we can ensure that the results obtained by the complex and resource-intensive computations performed on the Cloud are based on correct data and not data that have been tampered with or modified through faults in one of the many and complex subsystems of the overall system.

[1]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[2]  Reynold Xin,et al.  Scaling Spark in the Real World: Performance and Usability , 2015, Proc. VLDB Endow..

[3]  Xiaohong Jiang,et al.  Practical Verifiable Computation–A MapReduce Case Study , 2018, IEEE Transactions on Information Forensics and Security.

[4]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[5]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[6]  nbspAbdullah Al-Shomrani,et al.  Big Data Security and Privacy Challenges , 2018 .

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Zhimin Gao,et al.  Integrity Protection for Big Data Processing with Dynamic Redundancy Computation , 2015, 2015 IEEE International Conference on Autonomic Computing.

[9]  Murray Shanahan,et al.  The Event Calculus Explained , 1999, Artificial Intelligence Today.

[10]  George Spanoudakis,et al.  The SERENITY Runtime Monitoring Framework , 2009, Security and Dependability for Ambient Intelligence.

[11]  Linlin Ci,et al.  jMonAtt: Integrity Monitoring and Attestation of JVM-Based Applications in Cloud Computing , 2017, 2017 4th International Conference on Information Science and Control Engineering (ICISCE).