Survey on Frameworks for Distributed Computing: Hadoop, Spark and Storm

The storage and management of information has always been a challenge for software engineering, new programing approaches had to be found, parallel processing and then distributed computing programing models were developed, and new programing frameworks were developed to assist software developers. This is where Hadoop framework, an open source implementation of MapReduce programing model, that also takes advantage of a distributed file system, takes its lead, but in the meantime, since its presentation, there were evolutions to the MapReduce and new programing models that were introduced by Spark and Storm frameworks, that show promising results.

[1]  Xubin He,et al.  Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[2]  Eunmi Choi,et al.  A Taxonomy and Survey on Distributed File Systems , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[3]  Jun Wang,et al.  Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[4]  Meina Song,et al.  THE optimization of HDFS based on small files , 2010, 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT).

[5]  GhemawatSanjay,et al.  The Google file system , 2003 .

[6]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[7]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[8]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).