Interactive Entity Centric Analysis of Log Data
暂无分享,去创建一个
Interactive entity centric analysis of log data can help us gain fine granularity insights on business. In this paper, firstly we describe a fiber based partitioning method for log data, which accelerate later entity centric analysis. Secondly, we present our fiber based partitioner which is used by Spark SQL query engine. Fiber based partitioner takes locations of data blocks into account when loading data from HDFS into RDD, and when shuffling data from upstream operators to downstream operators during joining, avoids data interchange between node and speeds up query processing. Finally, we present our experiment results which demonstrates that fiber based partitioner improve entity centric queries.
[1] Yang Liu,et al. Entity Fiber Based Partitioning, No Loss Staging and Fast Loading of Log Data , 2016, 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT).