Comparative Research on Active Learning of Big Aata based on Mapreduce and Spark

Abstract The big data era is coming, and it is changing itself in science, engineering, medicine, healthcare, finance, business, and ultimately in our society. Traditional data may not be able to handle large amounts of analytical data. Big data can appear in three formats: structured, unstructured, and semi-structured. Map Reduce and Spark are the two most popular open-source important frameworks for large-scale data analysis. The performance of Map Reduce and Spark will vary depending on the application being implemented. The Map Reduce program model is a Hadoop configuration used to store large data formats stored in the Hadoop File System (HDFS) in big data. This is an integral part of the Hadoop framework that contains its core parts. Map Reduce has failed those programs for real-time data processing as it is designed to do volume processing on large volumes of data. Apache Spark is a data processing framework that can quickly handle tasks on large datasets, and whether on its own or in series with other distributed computing tools, across multiple computers and data processing tasks can be distributed. Apache Spark is a data processing framework that can quickly handle tasks on large datasets. Whether on its own or in series with other distributed computing tools, across multiple computers and data processing tasks can be distributed. The Map Reduce is two times larger than the spark. Thus, the Spark model quickly deploys and Apache spark, proving that data is more effective than a Map Reduce.

[1]  Annette Eicker,et al.  Satellites provide the big picture , 2015, Science.

[2]  Yun Xue,et al.  A Novel Parallel Biclustering Approach and Its Application to Identify and Segment Highly Profitable Telecom Customers , 2019, IEEE Access.

[3]  Yu Liu,et al.  Parallel online sequential extreme learning machine based on MapReduce , 2015, Neurocomputing.

[4]  Witold Pedrycz,et al.  On Distributed Fuzzy Decision Trees for Big Data , 2018, IEEE Transactions on Fuzzy Systems.

[5]  K. Sakthidasan Sankaran,et al.  Delay-aware concurrent data management method for IoT collaborative mobile edge computing environment , 2020, Microprocess. Microsystems.

[6]  Nor Badrul Anuar,et al.  Blending Big Data Analytics: Review on Challenges and a Recent Study , 2020, IEEE Access.

[7]  Jong-Moon Chung,et al.  Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures , 2019, IEEE Access.

[8]  Kenli Li,et al.  A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Wei Huang,et al.  In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[10]  Hongzhi Wang,et al.  Trajectory Big Data Processing Based on Frequent Activity , 2019 .

[11]  Bu-Sung Lee,et al.  Fair Resource Allocation for Data-Intensive Computing in the Cloud , 2018, IEEE Transactions on Services Computing.

[12]  Won-Ki Jeong,et al.  Distributed Interactive Visualization Using GPU-Optimized Spark , 2020, IEEE Transactions on Visualization and Computer Graphics.

[13]  R. Karthigaivel,et al.  Application of Machine Learning and Big Data in Doubly Fed Induction Generator based Stability Analysis of Multi Machine System using Substantial Transformative Optimization Algorithm , 2020, Microprocess. Microsystems.

[14]  Dongyao Wu,et al.  HDM: A Composable Framework for Big Data Processing , 2018, IEEE Transactions on Big Data.

[15]  Gangman Yi,et al.  Location-Based Parallel Sequential Pattern Mining Algorithm , 2019, IEEE Access.

[16]  Abid Yahya,et al.  A Secured Data Management Scheme for Smart Societies in Industrial Internet of Things Environment , 2018, IEEE Access.

[17]  Changjun Jiang,et al.  Cross-Platform Resource Scheduling for Spark and MapReduce on YARN , 2017, IEEE Transactions on Computers.

[18]  Deke Guo,et al.  Balance resource allocation for spark jobs based on prediction of the optimal resource , 2020, Tsinghua Science and Technology.

[19]  Ehud Gudes,et al.  A Survey on Geographically Distributed Big-Data Processing Using MapReduce , 2017, IEEE Transactions on Big Data.