The big data system, components, tools, and technologies: a survey

The traditional databases are not capable of handling unstructured data and high volumes of real-time datasets. Diverse datasets are unstructured lead to big data, and it is laborious to store, manage, process, analyze, visualize, and extract the useful insights from these datasets using traditional database approaches. However, many technical aspects exist in refining large heterogeneous datasets in the trend of big data. This paper aims to present a generalized view of complete big data system which includes several stages and key components of each stage in processing the big data. In particular, we compare and contrast various distributed file systems and MapReduce-supported NoSQL databases concerning certain parameters in data management process. Further, we present distinct distributed/cloud-based machine learning (ML) tools that play a key role to design, develop and deploy data models. The paper investigates case studies on distributed ML tools such as Mahout, Spark MLlib, and FlinkML. Further, we classify analytics based on the type of data, domain, and application. We distinguish various visualization tools pertaining three parameters: functionality, analysis capabilities, and supported development environment. Furthermore, we systematically investigate big data tools and technologies (Hadoop 3.0, Spark 2.3) including distributed/cloud-based stream processing tools in a comparative approach. Moreover, we discuss functionalities of several SQL Query tools on Hadoop based on 10 parameters. Finally, We present some critical points relevant to research directions and opportunities according to the current trend of big data. Investigating infrastructure tools for big data with recent developments provides a better understanding that how different tools and technologies apply to solve real-life applications.

[1]  Jignesh M. Patel,et al.  Column-Oriented Storage Techniques for MapReduce , 2011, Proc. VLDB Endow..

[2]  Keshav Pingali,et al.  Parallel graph analytics , 2016, Commun. ACM.

[3]  Rob Kitchin,et al.  The data revolution : big data, open data, data infrastructures & their consequences , 2014 .

[4]  Milind Bhandarkar,et al.  HAWQ: a massively parallel processing SQL engine in hadoop , 2014, SIGMOD Conference.

[5]  Gianmarco De Francisci Morales,et al.  SAMOA: scalable advanced massive online analysis , 2015, J. Mach. Learn. Res..

[6]  Murray Campbell,et al.  Analytics Ecosystem Transformation: A Force for Business Model Innovation , 2011, 2011 Annual SRII Global Conference.

[7]  Phil Trinder,et al.  Scalable persistent storage for Erlang: theory and practice , 2013, Erlang '13.

[8]  Joseph Gonzalez,et al.  GraphLab: A Distributed Framework for Machine Learning in the Cloud , 2011, ArXiv.

[9]  Marcin Zukowski,et al.  Vectorwise: Beyond Column Stores , 2012, IEEE Data Eng. Bull..

[10]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[11]  Joshua Zhexue Huang,et al.  Big data analytics on Apache Spark , 2016, International Journal of Data Science and Analytics.

[12]  Carlo Curino,et al.  Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications , 2015, SIGMOD Conference.

[13]  Yunhuai Liu,et al.  The big data analytics and applications of the surveillance system using video structured description technology , 2016, Cluster Computing.

[14]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[15]  Vladimir Batagelj,et al.  Pajek - Program for Large Network Analysis , 1999 .

[16]  Riccardo Miotto,et al.  Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams , 2016, Briefings Bioinform..

[17]  Emad A. Mohammed,et al.  Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends , 2014, BioData Mining.

[18]  Cory Doctorow,et al.  Big data: Welcome to the petacentre , 2008, Nature.

[19]  Reynold Xin,et al.  Apache Spark , 2016 .

[20]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[21]  Shahriar Akter,et al.  Big data analytics in E-commerce: a systematic review and agenda for future research , 2016, Electronic Markets.

[22]  David A. Schweidel,et al.  Opportunities for Innovation in Social Media Analytics , 2017 .

[23]  Michael J. Freedman,et al.  Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area , 2014, NSDI.

[24]  Gordon S. Blair,et al.  A generic component model for building systems software , 2008, TOCS.

[25]  V. Ganesh,et al.  HBase and Hypertable for large scale distributed storage systems A Performance evaluation for Open Source BigTable Implementations , 2008 .

[26]  Reynold Xin,et al.  GraphX: Unifying Data-Parallel and Graph-Parallel Analytics , 2014, ArXiv.

[27]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[28]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[29]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[30]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[31]  Yuzhong Qu,et al.  Falcon-AO: A practical ontology matching system , 2008, J. Web Semant..

[32]  Erhard Rahm,et al.  2 J un 2 01 5 GRADOOP : Scalable Graph Data Management and Analytics with Hadoop-Technical Report - , 2015 .

[33]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[34]  Harald Lampesberger,et al.  Technologies for Web and cloud service interaction: a survey , 2014, Service Oriented Computing and Applications.

[35]  Eunmi Choi,et al.  A Taxonomy and Survey on Distributed File Systems , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[36]  Rolf Apweiler,et al.  The European Bioinformatics Institute in 2017: data coordination and integration , 2017, Nucleic Acids Res..

[37]  Eric A. Brewer,et al.  A certain freedom: thoughts on the CAP theorem , 2010, PODC.

[38]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[39]  Lei Gao,et al.  Serving large-scale batch computed data with project Voldemort , 2012, FAST.

[40]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[41]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[42]  Andreas Neumann,et al.  Oozie: towards a scalable workflow management system for Hadoop , 2012, SWEET '12.

[43]  Ioana Manolescu,et al.  Web Data Management , 2011 .

[44]  Jose-Norberto Mazón,et al.  A survey on summarizability issues in multidimensional modeling , 2009, Data Knowl. Eng..

[45]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[46]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[47]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[48]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[49]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[50]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[51]  Matteo Muratori,et al.  Big Data issues and opportunities for electric utilities , 2015 .

[52]  Vijay V. Raghavan,et al.  Big Data: Promises and Problems , 2015, Computer.

[53]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[54]  Rajkumar Buyya,et al.  Distributed data stream processing and edge computing: A survey on resource elasticity and future directions , 2017, J. Netw. Comput. Appl..

[55]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[56]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[57]  Randy H. Katz,et al.  Chukwa: A System for Reliable Large-Scale Log Collection , 2010, LISA.

[58]  Holly Bell QlikView 11 For Developers , 1991 .

[59]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[60]  Muhammad Anis Uddin Nasir Fault Tolerance for Stream Processing Engines , 2016, ArXiv.

[61]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[62]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[63]  Siddharth Swarup Rautaray,et al.  Real time financial analysis using big data technologies , 2017, 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC).

[64]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[65]  Sun-Yuan Kung Visualization of big data , 2015, ICCI*CC.

[66]  Kathleen Ting,et al.  Apache Sqoop Cookbook , 2013 .

[67]  Shu-Ching Chen,et al.  Multimedia Big Data Analytics , 2018, ACM Comput. Surv..

[68]  GhemawatSanjay,et al.  The Google file system , 2003 .

[69]  Rajkumar Buyya,et al.  Big Data: Principles and Paradigms , 2016 .

[70]  Madhu Siddalingaiah,et al.  Pro Apache Hadoop , 2014, Apress.

[71]  Terry Jones,et al.  Performance of the IBM general parallel file system , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[72]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[73]  Rodney Landrum,et al.  Transparent Data Encryption , 2009 .

[74]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[75]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[76]  Tony Greicius Managing the Deluge of 'Big Data' From Space , 2015 .

[77]  Ben Shneiderman,et al.  Analyzing Social Media Networks with NodeXL: Insights from a Connected World , 2010 .

[78]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[79]  Jorge Bernardino,et al.  Choosing the right NoSQL database for the job: a quality attribute evaluation , 2015, Journal of Big Data.

[80]  Michael Hausenblas,et al.  Apache Drill: Interactive Ad-Hoc Analysis at Scale , 2013, Big Data.

[81]  Syed Akhter Hossain,et al.  NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison , 2013, ArXiv.

[82]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[83]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[84]  Liang Lin,et al.  Tenzing a SQL implementation on the MapReduce framework , 2011, Proc. VLDB Endow..

[85]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[86]  Jimeng Sun,et al.  Big data analytics for healthcare , 2013, KDD.

[87]  Sherif Sakr,et al.  The family of mapreduce and large-scale data processing systems , 2013, CSUR.

[88]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[89]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[90]  Daniel E. O'Leary,et al.  Big Data and Privacy: Emerging Issues , 2015, IEEE Intelligent Systems.

[91]  Omer F. Rana,et al.  Modelling Performance & Resource Management in Kubernetes , 2016, 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC).

[92]  Christophe Nicolle,et al.  Understandable Big Data: A survey , 2015, Comput. Sci. Rev..

[93]  Cong Wang,et al.  Twitter Heron: Towards Extensible Streaming Engines , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[94]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[95]  Véronique Van Vlasselaer,et al.  Fraud Analytics: A Broader Perspective , 2015 .

[96]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[97]  John Krumm,et al.  User-Generated Content , 2008, IEEE Pervasive Comput..

[98]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[99]  Mazliza Othman,et al.  Internet of Things security: A survey , 2017, J. Netw. Comput. Appl..

[100]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[101]  Mohsen Guizani,et al.  Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications , 2015, IEEE Communications Surveys & Tutorials.

[102]  Tridib Mukherjee,et al.  An Availability Analysis Approach for Deployment Configurations of Containers , 2021, IEEE Transactions on Services Computing.

[103]  Sherif Sakr,et al.  Big Data 2.0 Processing Systems: Taxonomy and Open Challenges , 2016, Journal of Grid Computing.

[104]  Rajkumar Buyya,et al.  The anatomy of big data computing , 2015, Softw. Pract. Exp..

[105]  Nanning Zheng,et al.  Knowledge Engineering With Big Data (BigKE): A 54-Month, 45-Million RMB, 15-Institution National Grand Project , 2017, IEEE Access.

[106]  Rajiv Ranjan,et al.  A note on software tools and technologies for delivering smart media-optimized big data applications in the cloud , 2015, Computing.

[107]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[108]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[109]  Pawel Terlecki,et al.  An analytic data engine for visualization in tableau , 2011, SIGMOD '11.

[110]  Sergei D. Kuznetsov,et al.  NoSQL data management systems , 2014, Programming and Computer Software.

[111]  Robert J. Meijer,et al.  Dynamically Scaling Apache Storm for the Analysis of Streaming Data , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[112]  A. Psyllidis,et al.  A Platform for Urban Analytics and Semantic Data Integration in City Planning , 2015, CAAD Futures.

[113]  Xiangfeng Wang,et al.  Machine learning for Big Data analytics in plants. , 2014, Trends in plant science.

[114]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[115]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[116]  Pangfeng Liu,et al.  Kylin: An efficient and scalable graph data processing system , 2013, 2013 IEEE International Conference on Big Data.

[117]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[118]  In Lee,et al.  Big data: Dimensions, evolution, impacts, and challenges , 2017 .

[119]  Viju Raghupathi,et al.  An Overview of Health Analytics , 2013 .

[120]  Nascif A. Abousalh-Neto,et al.  Big data exploration through visual analytics , 2012, IEEE VAST.

[121]  Chris Mattmann,et al.  Computing: A vision for data science , 2013, Nature.

[122]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[123]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[124]  Edward Sciore,et al.  SimpleDB: a simple java-based multiuser syst for teaching database internals , 2007, SIGCSE.

[125]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[126]  Soon Ae Chun,et al.  Linking and using social media data for enhancing public health analytics , 2017, J. Inf. Sci..

[127]  Holger Ziekow,et al.  Towards a Big Data Analytics Framework for IoT and Smart City Applications , 2015 .

[128]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[129]  Lawrence B. Holder,et al.  Mining Graph Data: Cook/Mining Graph Data , 2006 .

[130]  Lihua Huang,et al.  E-business Ecosystem and its Evolutionary Path: The Case of the Alibaba Group in China , 2010, Pac. Asia J. Assoc. Inf. Syst..

[131]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[132]  Christian Darabos,et al.  The multiscale backbone of the human phenotype network based on biological pathways , 2014, BioData Mining.

[133]  Michael Isard,et al.  TidyFS: A Simple and Small Distributed File System , 2011, USENIX Annual Technical Conference.

[134]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[135]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[136]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[137]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[138]  Niko Pollner,et al.  A Data Center Infrastructure Monitoring Platform Based on Storm and Trident , 2017, BTW.

[139]  Ian Gorton,et al.  The Changing Paradigm of Data-Intensive Computing , 2009, Computer.

[140]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[141]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[142]  Rachid Guerraoui,et al.  Fault-Tolerance by Replication in Distributed Systems , 1996, Ada-Europe.

[143]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[144]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[145]  Alexandros Labrinidis,et al.  Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[146]  María José del Jesús,et al.  Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks , 2014, WIREs Data Mining Knowl. Discov..

[147]  Umar Farooq Minhas,et al.  SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures , 2014, Proc. VLDB Endow..

[148]  Rein Ahas,et al.  Measuring tourism destinations using mobile tracking data , 2016 .

[149]  Dirk Neumann,et al.  Bringing Analytics into Practice: Evidence from the Power Sector , 2016, ICIS.

[150]  Xindong Wu,et al.  Knowledge Engineering with Big Data , 2015, IEEE Intell. Syst..

[151]  David B Nash Harnessing the power of big data in healthcare. , 2014, American health & drug benefits.

[152]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[153]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[154]  K. Morik A Survey of the Stream Processing Landscape , 2014 .

[155]  María S. Pérez-Hernández,et al.  Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[156]  Philippe Dobbelaere,et al.  Kafka versus RabbitMQ: A comparative study of two industry reference publish/subscribe implementations: Industry Paper , 2017, DEBS.

[157]  Daniel A. Keim,et al.  Visual analytics for the big data era — A comparative review of state-of-the-art commercial systems , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[158]  Sanming Zhou,et al.  Networking for Big Data: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[159]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[160]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[161]  Kyumars Sheykh Esmaili,et al.  Kafka versus RabbitMQ , 2017, ArXiv.

[162]  Barbara Martini,et al.  The Data Revolution. Big Data, Open Data, Data Infrastructures and Their Consequences , 2016 .

[163]  Véronique Van Vlasselaer,et al.  Fraud Analytics : Using Descriptive, Predictive, and Social Network Techniques:A Guide to Data Science for Fraud Detection , 2015 .

[164]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[165]  David Taniar,et al.  Sensor data management in the cloud: Data storage, data ingestion, and data retrieval , 2018, Concurr. Comput. Pract. Exp..

[166]  Xavier Franch,et al.  A software reference architecture for semantic-aware Big Data systems , 2017, Inf. Softw. Technol..

[167]  Zhenlong Li,et al.  Big Data and cloud computing: innovation opportunities and challenges , 2017, Int. J. Digit. Earth.

[168]  Vijay V. Raghavan,et al.  NoSQL Systems for Big Data Management , 2014, 2014 IEEE World Congress on Services.

[169]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[170]  Gheorghe Matei,et al.  Column-Oriented Databases, an Alternative for Analytical Environment , 2010 .

[171]  Lawrence B. Holder,et al.  Mining Graph Data , 2006 .

[172]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[173]  Xiaoyong Du,et al.  A Study of SQL-on-Hadoop Systems , 2014, BPOE@ASPLOS/VLDB.

[174]  A. Lo,et al.  A Survey of Systemic Risk Analytics , 2012 .

[175]  S. Fawcett,et al.  Data Science, Predictive Analytics, and Big Data: A Revolution that Will Transform Supply Chain Design and Management , 2013 .

[176]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[177]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[178]  Cees T. A. M. de Laat,et al.  Defining architecture components of the Big Data Ecosystem , 2014, 2014 International Conference on Collaboration Technologies and Systems (CTS).

[179]  Jason J. Jung,et al.  Social big data: Recent achievements and new challenges , 2015, Information Fusion.

[180]  Ayoub Ait Lahcen,et al.  Big Data technologies: A survey , 2017, J. King Saud Univ. Comput. Inf. Sci..

[181]  Arun Sharma,et al.  Scalable machine‐learning algorithms for big data analytics: a comprehensive review , 2016, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[182]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[183]  Gang Chen,et al.  MemepiC: Towards a Unified In-Memory Big Data Management System , 2019, IEEE Transactions on Big Data.

[184]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.