Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors

Abstract Emerging many-core processors feature very high memory bandwidth and computational power. For example, Intel Xeon Phi many-core processors of the Knights Corner (KNC) and Knights Landing (KNL) architectures embrace 60 to 64 x86-based CPU cores with 512-bit SIMD capabilities and high-bandwidth memories like the GDDR5 on KNC and on-package DRAMs on KNL. In this paper, we study the performance main-memory database operators and online analytical processing (OLAP) on such many-core architectures. We find that even the state-of-the-art database operators suffer severely from memory stalls and resource underutilization on those many-core processors. We argue that a software approach decomposing a coarse-grained operator into fine-grained phases and executing two independent phases with complementary resource requirements concurrently can address this problem. This approach allows more fine-grained control of resource utilization. Our experiments demonstrate significant performance gain and high resource utilization achieved by our proposed approaches on both KNC and KNL.

[1]  D. Vere-Jones Markov Chains , 1972, Nature.

[2]  Minos N. Garofalakis,et al.  Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources , 1997, VLDB.

[3]  Kenneth A. Ross,et al.  Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.

[4]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[5]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[6]  Sam Lightstone,et al.  Memory-Efficient Hash Joins , 2014, Proc. VLDB Endow..

[7]  Ziliang Zong,et al.  SQLPhi: A SQL-Based Database Engine for Intel Xeon Phi Coprocessors , 2014, BigDataScience '14.

[8]  Bingsheng He,et al.  GPL: A GPU-based Pipelined Query Processing Engine , 2016, SIGMOD Conference.

[9]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[10]  Jianlong Zhong,et al.  Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.

[11]  Xiao Chen,et al.  An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory , 2016, SIGMOD Conference.

[12]  Viktor Leis,et al.  Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age , 2014, SIGMOD Conference.

[13]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[14]  Bingsheng He,et al.  MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors , 2015, IEEE Transactions on Parallel and Distributed Systems.

[15]  Pradeep Dubey,et al.  Beacon: Deployment and Application of Intel Xeon Phi Coprocessorsfor Scientific Computing , 2015, Comput. Sci. Eng..

[16]  Gustavo Alonso,et al.  Deployment of Query Plans on Multicores , 2014, Proc. VLDB Endow..

[17]  Eric Lo,et al.  ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout , 2015, SIGMOD Conference.

[18]  Eleni Petraki,et al.  Holistic Indexing in Main-memory Column-stores , 2015, SIGMOD Conference.

[19]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[20]  Mary Lou Soffa,et al.  Contention aware execution: online contention detection and response , 2010, CGO '10.

[21]  Gustavo Alonso,et al.  Main-Memory Hash Joins on Modern Processor Architectures , 2015, IEEE Transactions on Knowledge and Data Engineering.

[22]  Xiaoli Du,et al.  A Study of Main-Memory Hash Joins on Many-core Processor: A Case with Intel Knights Landing Architecture , 2017, CIKM.

[23]  Bingsheng He,et al.  Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach , 2015, Proc. VLDB Endow..

[24]  Bingsheng He,et al.  Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor , 2016, SIGMOD Conference.

[25]  Jens Teubner,et al.  Robust Query Processing in Co-Processor-accelerated Databases , 2016, SIGMOD Conference.

[26]  Wu-chun Feng,et al.  ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors , 2015, ICS.

[27]  Sabela Ramos,et al.  Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[28]  Jignesh M. Patel,et al.  WideTable: An Accelerator for Analytical Data Processing , 2014, Proc. VLDB Endow..

[29]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[30]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[31]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[32]  Gustavo Alonso,et al.  Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited , 2013, Proc. VLDB Endow..