Performance analysis of extract, transform, load (ETL) in apache Hadoop atop NAS storage using ISCSI
暂无分享,去创建一个
Data analytics has become a key element of the business decision process over the last decade. ETL is Process to migrate the data from the source to the required database, Store and process the huge amount of structured and unstructured data for complex analysis business. Standard ETL tools don't efficiently handle it. Improving it can provide a better return on company's investment. Become interesting to find an opportunity to construct computing-storage devices low-cost, low-power components to perform ETL Process. In this paper, we proposed Hadoop on iSCSI over Ethernet adapted Network Attached Storage (NAS) to process ETL, investigate the benefits of running Hadoop over NAS storage as compared with normal HDFS using a benchmark about extract performance, transform performance and load performance. This research used 1 NameNode, 4 DataNodes, NAS Storage, and dataset to examine the proposed architecture. The result showed that the proposed architecture is ability to use low-cost components to deliver scalable performance and could become storage solution in the Big Data space.
[1] T. Jebeula. A Survey on ETL Tools , 2016 .
[2] Dave Josephsen,et al. Monitoring with Ganglia , 2012 .
[3] 周斌彦. ETL (Extract-Transform-Load) dispatching method and apparatus , 2014 .
[4] Extract , Transform , and Load Big Data with Apache Hadoop * , .
[5] Zheng Shao,et al. Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).