An MPSoC for energy-efficient database query processing

This paper presents a heterogeneous database hardware accelerator MPSoC manufactured in 28 nm SLP CMOS. The 18 mm2 chip integrates a runtime task scheduling unit for energy-efficient query processing and hierarchical power management supported by an ultra-fast dynamic voltage and frequency scaling. Four processing elements, connected by a star-mesh network-on-chip, are accelerated by an instruction set extension tailored to fundamental dataintensive applications. We evaluate the MPSoC with typical database benchmarks focusing on scans and bitmap operations. When the processing elements operate on data stored in local memories, the chip consumes 250 mW and shows a 96x energy efficiency improvement compared to state-of-the-art platforms.

[1]  Maria Ebling,et al.  An open ecosystem for mobile-cloud convergence , 2015, IEEE Communications Magazine.

[2]  Bharat Sukhwani,et al.  Database analytics acceleration using FPGAs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Gerhard Fettweis,et al.  Query processing on low-energy many-core processors , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[4]  Wolfgang Lehner,et al.  Fast Sorted-Set Intersection using SIMD Instructions , 2011, ADMS@VLDB.

[5]  Gerhard Fettweis,et al.  HASHI: An Application Specific Instruction Set Extension for Hashing , 2014, ADMS@VLDB.

[6]  Stephan Henker,et al.  An Energy Efficient Multi-Gbit/s NoC Transceiver Architecture With Combined AC/DC Drivers and Stoppable Clocking in 65 nm and 28 nm CMOS , 2015, IEEE Journal of Solid-State Circuits.

[7]  Pradeep Dubey,et al.  Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[8]  Wolfgang Lehner,et al.  Fast integer compression using SIMD instructions , 2010, DaMoN '10.

[9]  Gerhard Fettweis,et al.  Tomahawk , 2014, ACM Trans. Embed. Comput. Syst..

[10]  Ian F. Akyildiz,et al.  Sensor Networks , 2002, Encyclopedia of GIS.

[11]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[12]  Gerhard Fettweis,et al.  An application-specific instruction set for accelerating set-oriented database primitives , 2014, SIGMOD Conference.

[13]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[14]  Thomas Kürner,et al.  Exploration of Centralized Car2X-Systems over LTE , 2015, 2015 IEEE 81st Vehicular Technology Conference (VTC Spring).

[15]  René Schüffny,et al.  A power management architecture for fast per-core DVFS in heterogeneous MPSoCs , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[16]  B. Belkhouche,et al.  Acknowledgements We Would like to Thank , 1993 .

[17]  Kenneth A. Ross,et al.  Q100: the architecture and design of a database processing unit , 2014, ASPLOS.

[18]  Wolfgang Lehner,et al.  Dynamic fine-grained scheduling for energy-efficient main-memory queries , 2014, DaMoN '14.

[19]  Arie Shoshani,et al.  An efficient compression scheme for bitmap indices , 2004 .

[20]  Gerhard Fettweis,et al.  10.7 A 105GOPS 36mm2 heterogeneous SDR MPSoC with energy-aware dynamic scheduling and iterative detection-decoding for 4G in 65nm CMOS , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[21]  Vijay Srinivasan,et al.  4.2 A 20nm 32-Core 64MB L3 cache SPARC M7 processor , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[22]  Xenofontas A. Dimitropoulos,et al.  Indexing million of packets per second using GPUs , 2013, Internet Measurement Conference.

[23]  Mehul A. Shah,et al.  Analyzing the energy efficiency of a database server , 2010, SIGMOD Conference.

[24]  René Schüffny,et al.  A Fast-Locking ADPLL With Instantaneous Restart Capability in 28-nm CMOS Technology , 2013, IEEE Transactions on Circuits and Systems II: Express Briefs.