Big Data Management Performance Evaluation in Hadoop Ecosystem

With the further research of big data management, plenty of components for big data management have been developed. Based on Hadoop platform, these components provide solutions for big data management from different levels. The Hadoop ecosystem has gradually taken its shape. However, users usually lack the knowledge about the features of these components, such as the I/O pattern, capability, application scenes and so on. When dealing with some big data problems, these components are often chosen by user's experience and this will definitely lead to mismatch between the demands and the management tools. Thus, the platform cannot play out its optimal performance. Focus on this issue, this paper tested and evaluated several widely used mainstream big data management tools in Hadoop ecosystem from three levels: distributed file system, NoSQL database and SQL-like component. After the brief introduction to the typical management tools, comprehensive comparisons of these tools of the same level are carried out. The advantages and disadvantages are discussed and their performance are also tested and analyzed.

[1]  Christopher Olston,et al.  Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience , 2009, Proc. VLDB Endow..

[2]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[3]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Matjaz Depolli,et al.  A Comparison of Hadoop Tools for Analyzing Tabular Data , 2013, Informatica.

[6]  Neal Leavitt,et al.  Will NoSQL Databases Live Up to Their Promise? , 2010, Computer.

[7]  Marin Fotache,et al.  NoSQL in Higher Education. A Case Study , 2013 .

[8]  Feiyi Ornl Wang,et al.  Understanding Lustre Internals , 2009 .

[9]  Kyle Banker,et al.  MongoDB in Action , 2011 .

[10]  Carlos Maltzahn,et al.  Ceph as a Scalable Alternative to the Hadoop Distributed File System , 2010, login Usenix Mag..

[11]  Michael Stonebraker,et al.  SQL databases v. NoSQL databases , 2010, CACM.

[12]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[13]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[14]  Laks V. S. Lakshmanan,et al.  Proceedings of the 2008 ACM SIGMOD international conference on Management of data , 2008, SIGMOD 2008.

[15]  Lavanya Ramakrishnan,et al.  Performance evaluation of a MongoDB and hadoop platform for scientific data analysis , 2013, Science Cloud '13.

[16]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[17]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[18]  Alan L. Cox,et al.  The Hadoop distributed filesystem: Balancing portability and performance , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[19]  Michael Stonebraker,et al.  "One size fits all": an idea whose time has come and gone , 2018, Making Databases Work.

[20]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[21]  Domenico Diacono,et al.  Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis , 2014 .

[22]  Yang Zheng,et al.  Performance analysis and testing of HBase based on its architecture , 2013, 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS).

[23]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[24]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[25]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[26]  Dorin Carstoiu,et al.  Hbase - non SQL Database, Performances Evaluation , 2010, Int. J. Adv. Comp. Techn..