Performance improvement of I/O intensive OLAP with dynamic control of file storing location

Large scale data intensive applications, such as Online analytical processing (OLAP) and Mining from Big Data are one of most important applications in recent years. For improving these performances, growing sequential I/O performance is essential because storage devices are accessed sequentially by these large scale data intensive applications. For improving sequential I/O performance, we have proposed two methods for file location optimization in ZBR HDDs. In this paper, we focus on these methods and OLAP, and investigate effectiveness of these methods for large scale data intensive applications. First, we introduce the methods for improving sequential I/O performance for large scale distributed filesystem and distributed processing. Second, we apply the method to a popular OLAP benchmark TPC-H and discuss applicability of the method to OLAP. We then demonstrate that the method can reduce time to process the query.

[1]  Klemens Böhm,et al.  OLAP Query Routing and Physical Design in a Database Cluster , 2000, EDBT.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Scott D. Carson,et al.  A system for adaptive disk rearrangement , 1990, Softw. Pract. Exp..

[4]  Alfredo Cuzzocrea,et al.  Data warehousing and OLAP over big data: current challenges and future research directions , 2013, DOLAP '13.

[5]  Gregory R. Ganger,et al.  Automated Disk Drive Characterization (CMU-CS-99-176) , 1999 .

[6]  Prashant J. Shenoy,et al.  A platform for scalable one-pass analytics using MapReduce , 2011, SIGMOD '11.

[7]  Saneyasu Yamaguchi,et al.  Filesystem Layout Reorganization in Virtualized Environment , 2012, 2012 9th International Conference on Ubiquitous Intelligence and Computing and 9th International Conference on Autonomic and Trusted Computing.

[8]  Ashok K. Agrawala,et al.  Temporally determinate disk access: an experimental approach , 1997 .

[9]  Saneyasu Yamaguchi,et al.  Dynamic Memory Allocation in Virtual Machines Based on Cache Hit Ratio , 2015, 2015 Third International Symposium on Computing and Networking (CANDAR).

[10]  Peter Druschel,et al.  Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O , 2001, SOSP.

[11]  Kenneth Salem,et al.  Adaptive block rearrangement , 1993, TOCS.

[12]  Timos K. Sellis,et al.  A survey of logical models for OLAP databases , 1999, SGMD.

[13]  Chris Ruemmler,et al.  Disk Shuffling , 1991 .

[14]  Antony I. T. Rowstron,et al.  Camdoop: Exploiting In-network Aggregation for Big Data Applications , 2012, NSDI.

[15]  鬼塚 真,et al.  MapReduce optimization using mapper-side aggregation , 2012 .

[16]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[17]  Vagelis Hristidis,et al.  BORG: Block-reORGanization for Self-optimizing Storage Systems , 2009, FAST.

[18]  Alexander A. Stepanov,et al.  Loge: A Self-Organizing Disk Controller , 1991 .

[19]  Remzi H. Arpaci-Dusseau,et al.  Micro-Benchmark Based Extraction of Local and Global Disk , 2000 .

[20]  Remzi H. Arpaci-Dusseau,et al.  Microbenchmark-based Extraction of Local and Global Disk Characteristics , 1999 .

[21]  Saneyasu Yamaguchi,et al.  I/O scheduling in Android devices with flash storage , 2014, ICUIMC '14.

[22]  Chak-Kuen Wong,et al.  Algorithmic Studies in Mass Storage Systems , 1983, Springer Berlin Heidelberg.

[23]  Saneyasu Yamaguchi,et al.  Improving the I/O Performance in the Reduce Phase of Hadoop , 2015, 2015 Third International Symposium on Computing and Networking (CANDAR).

[24]  Masaru Kitsuregawa,et al.  Energy Efficient Storage Management Cooperated with Large Data Intensive Applications , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[25]  Gregory R. Ganger,et al.  Automated Disk Drive Characterization , 1999 .

[26]  Zoran Dimitrijevic,et al.  Diskbench : User-level Disk Feature Extraction Tool , 2004 .

[27]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[28]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[29]  Saneyasu Yamaguchi,et al.  Power effective file layout with application support in virtualized environment , 2015, 2015 IEEE International Conference on Control System, Computing and Engineering (ICCSCE).