Baran: An Effective MapReduce-Based Solution to Solve Big Data Problems

The MapReduce method is widely used for big data solutions. This method solves big data problems on distributed hardware platforms. However, MapReduce architectures are inefficient. Data locality, network congestion, and low hardware performance are the main issues. In this chapter, the authors introduce a method that solves these problems. Baran is a method that, if an algorithm can satisfy its conditions, can dramatically improve performance and solve the data locality problem and consequences such as network congestion and low hardware performance. The authors apply this method to previous works on data warehouse, graph, and data mining problems. The results show that applying Baran to an algorithm can solve it on the MapReduce architecture properly.

[1]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[2]  Sharmistha Bhattacharya Halder,et al.  Attribute Reduction Using Bayesian Decision Theoretic Rough Set Models , 2014, Int. J. Rough Sets Data Anal..

[3]  Changjun Jiang,et al.  Network-Adaptive Scheduling of Data-Intensive Parallel Jobs with Dependencies in Clusters , 2017, 2017 IEEE International Conference on Autonomic Computing (ICAC).

[4]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[5]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[6]  Herodotos Herodotou,et al.  MapReduce programming and cost-based optimization? , 2011, Proc. VLDB Endow..

[7]  Chao Tian,et al.  Nova: continuous Pig/Hadoop workflows , 2011, SIGMOD '11.

[8]  Andrew Basden,et al.  Philosophical Frameworks for Understanding Information Systems , 2007 .

[9]  Kenli Li,et al.  An optimized MapReduce workflow scheduling algorithm for heterogeneous computing , 2016, The Journal of Supercomputing.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Mahdi Niamanesh,et al.  ScaDiPaSi: An Effective Scalable and Distributable MapReduce-Based Method to Find Patient Similarity on Huge Healthcare Networks , 2015, Big Data Res..

[12]  Mahdi Niamanesh,et al.  Aras: A Method with Uniform Distributed Dataset to Solve Data Warehouse Problems for Big Data , 2017, Int. J. Distributed Syst. Technol..

[13]  Andrew Basden,et al.  A framework for understanding Information Technology resources , 2008 .

[14]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[15]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[16]  Lei Ying,et al.  MapTask Scheduling in MapReduce With Data Locality: Throughput and Heavy-Traffic Optimality , 2013, IEEE/ACM Transactions on Networking.

[17]  Mahdi Niamanesh,et al.  Arvand: A Method to Integrate Multidimensional Data Sources Into Big Data Analytic Structures , 2018, J. Inf. Sci. Eng..

[18]  Guigang Zhang,et al.  MapReduce++: Ecient Processing of MapReduce Jobs in the Cloud ? , 2012 .

[19]  Mahdi Niamanesh,et al.  ScaDiGraph: A MapReduce-based Method for Solving Graph Problems , 2017, J. Inf. Sci. Eng..

[20]  Teng Wang,et al.  EA2S2: An Efficient Application-Aware Storage System for Big Data Processing in Heterogeneous Clusters , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[21]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[22]  Panos Kalnis,et al.  Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning , 2016, The VLDB Journal.

[23]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[24]  Renu Vashist,et al.  Comparing and Contrasting Rough Set with Logistic Regression for a Dataset , 2014, Int. J. Rough Sets Data Anal..

[25]  Mahdi Niamanesh,et al.  ScadiBino: An effective MapReduce-based association rule mining method , 2014, ICEC '14.

[26]  Ayman Elnaggar,et al.  Towards Real-Time Analytics in the Cloud , 2013, 2013 IEEE Ninth World Congress on Services.

[27]  Rui Zhang,et al.  SwiftAnalytics: Optimizing Object Storage for Big Data Analytics , 2017, 2017 IEEE International Conference on Cloud Engineering (IC2E).

[28]  Yi Yao,et al.  Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters , 2017, IEEE Transactions on Cloud Computing.

[29]  Mahdi Niamanesh,et al.  Atrak: a MapReduce-based data warehouse for big data , 2017, The Journal of Supercomputing.

[30]  B. K. Tripathy,et al.  Rough Set Based Similarity Measures for Data Analytics in Spatial Epidemiology , 2016, Int. J. Rough Sets Data Anal..