Apache Spark and Apache Ignite Performance Analysis

Big Data represents an actual research topic. More and more it becomes part of people life's through different applications that are used daily, such as stock exchange, news, social media, health-care. All these applications make use of Big Data technologies for storing and processing information. There have been developed numerous technologies for implementing Big Data requirements and it is interesting to follow their strengths and weaknesses, when to use one over another and how well they perform in different situations. In this paper, we compare two frameworks Apache Spark and Ignite that are used for data processing. We perform the comparison taking into consideration the following aspects: features, implementation, architecture, and performance metrics. In order to test the performance, we used two popular applications such as word count and k-means clustering. Results show that Spark achieved better performance than Ignite.

[1]  Laurent Lefèvre,et al.  Quality of Cloud Services Determined by the Dynamic Management of Scheduling Models for Complex Heterogeneous Workloads , 2018, 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC).

[2]  Florin Pop,et al.  Asymptotic scheduling for many task computing in Big Data platforms , 2015, Inf. Sci..

[3]  Moumita Chakraborty,et al.  A Task Scheduling Technique Based on Particle Swarm Optimization Algorithm in Cloud Environment , 2018, Advances in Intelligent Systems and Computing.

[4]  Jing Li,et al.  Energy Efficient Cloud Storage Service: Key Issues and Challenges , 2013, 2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies.

[5]  Adriana Alexandru,et al.  Improved Patient Engagement in Self-management of Health, a Key to Sustainable Preventative Healthcare Systems , 2017, MobiHealth.

[6]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[7]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  Valentin Cristea,et al.  Cost models - pillars for efficient cloud computing: position paper , 2013, Int. J. Intell. Syst. Technol. Appl..

[9]  LAURA VASILIU,et al.  A Hybrid Scheduler for Many Task Computing in Big Data Systems , 2017, Int. J. Appl. Math. Comput. Sci..

[10]  Adriana Alexandru,et al.  Opportunities brought by big data in providing silver digital patients with ICT-based services that support independent living and lifelong learning , 2017, 2017 Ninth International Conference on Ubiquitous and Future Networks (ICUFN).

[11]  Nicolae Ţăpuş,et al.  A Novel Approach of Reducing Energy Consumption by Utilizing Enthalpy in Mobile Cloud Computing , 2017 .

[12]  Adriana Alexandru,et al.  A RFID-based tracking approach for building up smart solutions for consumer's safety , 2017, 2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE).

[13]  Valentin Cristea,et al.  Impact of Virtual Machines Heterogeneity on Data Center Power Consumption in Data-Intensive Applications , 2015, ARMS-CC@PODC.

[14]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[15]  Adrian Paschke,et al.  Service Level Agreement Characteristics of Monitoring Wireless Sensor Networks for Water Resource Management (SLAs4Water) , 2017 .

[16]  Valentin Cristea,et al.  Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing , 2015, Future Gener. Comput. Syst..

[17]  Adriana Alexandru,et al.  Enabling Assistive Technologies to Shape the Future of the Intensive Senior-Centred Care: A Case Study Approach , 2017 .

[18]  Parmeet Kaur,et al.  Scheduling Data Intensive Scientific Workflows in Cloud Environment Using Nature Inspired Algorithms , 2019 .

[19]  Christian Esposito,et al.  Advanced services for efficient management of smart farms , 2018, J. Parallel Distributed Comput..

[20]  Euripides G. M. Petrakis,et al.  Virtual machine cluster mobility in inter-cloud platforms , 2017, Future Gener. Comput. Syst..

[21]  Reynold Xin,et al.  Apache Spark , 2016 .

[22]  Rajkumar Buyya,et al.  An Inter-Cloud Meta-Scheduling (ICMS) Simulation Framework: Architecture and Evaluation , 2018, IEEE Transactions on Services Computing.

[23]  Jorge Ejarque,et al.  Dynamic energy-aware scheduling for parallel task-based application in cloud computing , 2018, Future Gener. Comput. Syst..

[24]  Gregory R. Ganger,et al.  Stratus: cost-aware container scheduling in the public cloud , 2018, SoCC.

[25]  Valentin Cristea,et al.  Resource CoAllocation for Scheduling Tasks with Dependencies, in Grid , 2011, ArXiv.

[26]  M. Ianculescu,et al.  Harnessing the potential of big data in Romanian healthcare , 2017, 2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE).

[27]  Muamer N. Mohammed,et al.  A Krill Herd Behaviour Inspired Load Balancing of Tasks in Cloud Computing , 2017 .