论文信息 - Evaluating New Approaches of Big Data Analytics Frameworks

Evaluating New Approaches of Big Data Analytics Frameworks

The big data topic will be one of the leading growth markets in information technology in the next years. One problem in this area is the efficient computation of huge data volumes, especially for complex algorithms in data mining and machine learning tasks. This paper discuss new processing frameworks for big and smart data in distributed environments and presents a benchmark between two frameworks - Apache Flink and Apache Spark - based on a mixed workload with algorithms from different analytic areas with different real-world datasets.

[1] Felix Naumann,et al. The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[2] Yon Dohn Chung,et al. Parallel data processing with MapReduce: a survey , 2012, SGMD.

[3] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[4] Alberto Montresor,et al. An evaluation study of BigData frameworks for graph processing , 2013, 2013 IEEE International Conference on Big Data.

[5] Volker Markl,et al. Spinning Fast Iterative Data Flows , 2012, Proc. VLDB Endow..

[6] Astrid Rheinländer,et al. Opening the Black Boxes in Data Flow Optimization , 2012, Proc. VLDB Endow..

[7] Chen Feng,et al. Performance Benefits of DataMPI: A Case Study with BigDataBench , 2014, BPOE@ASPLOS/VLDB.

[8] Joshua Evan Blumenstock,et al. Size matters: word count as a measure of quality on wikipedia , 2008, WWW.

[9] Odej Kao,et al. Nephele: efficient parallel data processing in the cloud , 2009, MTAGS '09.

[10] J. A. Hartigan,et al. A k-means clustering algorithm , 1979 .

[11] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[12] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.