论文信息 - The impact of cluster characteristics on HiveQL query optimization

The impact of cluster characteristics on HiveQL query optimization

Huge amount of data is stored by different kinds of applications for further analysis. Relational databases were used for decades as data storages, but in some cases they are not suitable for Big Data processing. To solve the problem, non-relational databases were developed. As a help for transferring data from relational databases to non-relational databases, adequate tools were developed. In this paper, a tool named Sqoop is presented. The issue of query optimization should be addressed by all applications that deal with large amounts of data, regardless of their field of application and scope. The impact of cluster characteristics on HiveQL query optimization is analyzed in this paper.

Ognjen V. Joldzic | Dijana R. Vukovic

[1] Anja Gruenheid,et al. Query optimization using column statistics in hive , 2011, IDEAS '11.

[2] Songlin Hu,et al. QMapper: a tool for SQL optimization on hive using query rewriting , 2013, WWW '13 Companion.

[3] Neerja Bhatnagar. Security in Relational Databases , 2010, Handbook of Information and Communication Security.

[4] Zheng Shao,et al. Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).