Bio-inspired Think-and-Share Optimization for Big Data Provenance in Wireless Sensor Networks

Big data systems are being increasingly adopted by the enterprises exploiting big data applications to manage data-driven process, practices, and systems in an enterprise wide context. Specifically, big data systems and their underlying applications empower enterprises with analytical decision making (e.g., recommender/decision support systems) to optimize organizational productivity, competitiveness, and growth. Despite these benefits, big data applications face some challenges that include but not limited to security and privacy, authenticity, and reliability of critical data that may result in propagation of false information across systems. Data provenance as an approach and enabling mechanism (to identify the origin, manage the creation, and track the propagation of information etc.) can be a solution to above mentioned challenges for data management in an enterprise context. Data provenance solution(s) can help stakeholders and enterprises to assess the quality of data along with authenticity, reliability, and trust of information on the basis of identity, reproducibility and integrity of data. Considering the wide spread adoption of big data applications and the needs for data provenance, this paper focuses on (i) analyzing state-of-the-art for holistic presentation of provenance in big-data applications (ii) proposing a bio-inspired approach with underlying algorithm that exploits human thinking approach to support data provenance in Wireless Sensor Networks (WSNs). The proposed ‘Think-and-Share Optimization’ (TaSO) algorithms modularizes and automates data provenance in WSNs that are deployed and operated in enterprises. Evaluation of TaSO algorithm demonstrates its efficiency in terms of connectivity, closeness to the sink node, coverage, and execution time. The proposed research contextualizes bio-inspired computation to enable and optimize data provenance in WSNs. Future research aims to exploit machine learning techniques (with underlying algorithms) to automate data provenance for big data systems in networked environments.

[1]  Rik Van de Walle,et al.  Towards Multi-level Provenance Reconstruction of Information Diffusion on Social Media , 2015, CIKM.

[2]  Beth Plale,et al.  Big Data Provenance Analysis and Visualization , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3]  Asad Waqar Malik,et al.  Classification and Mapping of Adaptive Security for Mobile Computing , 2020, IEEE Transactions on Emerging Topics in Computing.

[4]  David Medyckyj-Scott,et al.  Implementations of fine-grained automated data provenance to support transparent environmental modelling , 2019, Environ. Model. Softw..

[5]  Xing Zhang,et al.  A Blockchain-Based Scheme for Secure Data Provenance in Wireless Sensor Networks , 2018, 2018 14th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN).

[6]  Margo I. Seltzer,et al.  Provenance for the Cloud , 2010, FAST.

[7]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[9]  Elisa Bertino,et al.  Provenance for Wireless Sensor Networks: A Survey , 2016, Data Science and Engineering.

[10]  Prof. C. M. Jadhav,et al.  Privacy-Preserving Public Auditing for Shared Data in the Cloud , 2015 .

[11]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[12]  Rik Van de Walle,et al.  Modeling Information Diffusion in Social Media as Provenance with W3C PROV , 2015, WWW.

[13]  Ching-Seh Wu,et al.  Provenance as a Service: A Data-centric Approach for Real-Time Monitoring , 2014, 2014 IEEE International Congress on Big Data.

[14]  Olga Angelopoulou,et al.  Challenges of Data Provenance for Cloud Forensic Investigations , 2015, 2015 10th International Conference on Availability, Reliability and Security.

[15]  Alfredo Cuzzocrea Big Data Provenance: State-Of-The-Art Analysis and Emerging Research Challenges , 2016, EDBT/ICDT Workshops.

[16]  Imad M. Abbadi,et al.  Challenges for Provenance in Cloud Computing , 2011, TaPP.

[17]  Alfredo Cuzzocrea Provenance Research Issues and Challenges in the Big Data Era , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[18]  Yang Liu,et al.  Detecting Rumors Through Modeling Information Propagation Networks in a Social Media Environment , 2015, IEEE Transactions on Computational Social Systems.

[19]  Ruben Verborgh,et al.  Web-scale provenance reconstruction of implicit information diffusion on social media , 2018, Distributed and Parallel Databases.

[20]  Val Tannen,et al.  Querying data provenance , 2010, SIGMOD Conference.

[21]  Jianwu Wang,et al.  Provenance for MapReduce-based data-intensive workflows , 2011, WORKS '11.

[22]  Bilal. Arshad NeuroProv - A visualisation system to enhance the utility of provenance Data for neuroimaging analysis , 2015 .

[23]  Shouhuai Xu,et al.  A roadmap for privacy-enhanced secure data provenance , 2014, Journal of Intelligent Information Systems.

[24]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[25]  Huan Liu,et al.  A tool for collecting provenance data in social media , 2013, KDD.

[26]  Shadi Aljawarneh,et al.  A resource-efficient encryption algorithm for multimedia big data , 2017, Multimedia Tools and Applications.

[27]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[28]  David M. Eyers,et al.  Data provenance to audit compliance with privacy policy in the Internet of Things , 2017, Personal and Ubiquitous Computing.

[29]  Helmut Hlavacs,et al.  Provenance Framework for the Cloud Infrastructure: Why and How? , 2013 .

[30]  Dan Feng,et al.  Efficient Provenance Management via Clustering and Hybrid Storage in Big Data Environments , 2020, IEEE Transactions on Big Data.