论文信息 - Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors

Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors

Abstract Emerging many-core processors feature very high memory bandwidth and computational power. For example, Intel Xeon Phi many-core processors of the Knights Corner (KNC) and Knights Landing (KNL) architectures embrace 60 to 64 x86-based CPU cores with 512-bit SIMD capabilities and high-bandwidth memories like the GDDR5 on KNC and on-package DRAMs on KNL. In this paper, we study the performance main-memory database operators and online analytical processing (OLAP) on such many-core architectures. We find that even the state-of-the-art database operators suffer severely from memory stalls and resource underutilization on those many-core processors. We argue that a software approach decomposing a coarse-grained operator into fine-grained phases and executing two independent phases with complementary resource requirements concurrently can address this problem. This approach allows more fine-grained control of resource utilization. Our experiments demonstrate significant performance gain and high resource utilization achieved by our proposed approaches on both KNC and KNL.

Bingsheng He | Chiew Tong Lau | Mian Lu | Xuntao Cheng

[1] D. Vere-Jones. Markov Chains , 1972, Nature.

[2] Minos N. Garofalakis,et al. Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources , 1997, VLDB.

[3] Kenneth A. Ross,et al. Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.

[4] Beng Chin Ooi,et al. In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[5] Thomas Neumann,et al. Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[6] Sam Lightstone,et al. Memory-Efficient Hash Joins , 2014, Proc. VLDB Endow..

[7] Ziliang Zong,et al. SQLPhi: A SQL-Based Database Engine for Intel Xeon Phi Coprocessors , 2014, BigDataScience '14.

[8] Bingsheng He,et al. GPL: A GPU-based Pipelined Query Processing Engine , 2016, SIGMOD Conference.

[9] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[10] Jianlong Zhong,et al. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.

[11] Xiao Chen,et al. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory , 2016, SIGMOD Conference.

[12] Viktor Leis,et al. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age , 2014, SIGMOD Conference.

[13] Tsuyoshi Murata,et al. {m , 1934, ACML.

[14] Bingsheng He,et al. MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors , 2015, IEEE Transactions on Parallel and Distributed Systems.

[15] Pradeep Dubey,et al. Beacon: Deployment and Application of Intel Xeon Phi Coprocessorsfor Scientific Computing , 2015, Comput. Sci. Eng..

[16] Gustavo Alonso,et al. Deployment of Query Plans on Multicores , 2014, Proc. VLDB Endow..

[17] Eric Lo,et al. ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout , 2015, SIGMOD Conference.

[18] Eleni Petraki,et al. Holistic Indexing in Main-memory Column-stores , 2015, SIGMOD Conference.

[19] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[20] Mary Lou Soffa,et al. Contention aware execution: online contention detection and response , 2010, CGO '10.

[21] Gustavo Alonso,et al. Main-Memory Hash Joins on Modern Processor Architectures , 2015, IEEE Transactions on Knowledge and Data Engineering.

[22] Xiaoli Du,et al. A Study of Main-Memory Hash Joins on Many-core Processor: A Case with Intel Knights Landing Architecture , 2017, CIKM.

[23] Bingsheng He,et al. Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach , 2015, Proc. VLDB Endow..

[24] Bingsheng He,et al. Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor , 2016, SIGMOD Conference.

[25] Jens Teubner,et al. Robust Query Processing in Co-Processor-accelerated Databases , 2016, SIGMOD Conference.

[26] Wu-chun Feng,et al. ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors , 2015, ICS.

[27] Sabela Ramos,et al. Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[28] Jignesh M. Patel,et al. WideTable: An Accelerator for Analytical Data Processing , 2014, Proc. VLDB Endow..

[29] Martin L. Kersten,et al. MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[30] Michael Stonebraker,et al. C-Store: A Column-oriented DBMS , 2005, VLDB.

[31] Alfons Kemper,et al. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[32] Gustavo Alonso,et al. Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited , 2013, Proc. VLDB Endow..