Extreme Datacenter Specialization for Planet-Scale Computing: ASIC Clouds

Planet-scale applications are driving the exponential growth of the cloud, and datacenter specialization is the key enabler of this trend, providing order of magnitudes improvements in cost-effectiveness and energy-efficiency. While exascale computing remains a goal for supercomputing, specialized datacenters have emerged and have demonstrated beyond-exascale performance and efficiency in specific domains. This paper generalizes the applications, design methodology, and deployment challenges of the most extreme form of specialized datacenter: ASIC Clouds. It analyzes two game-changing, real-world ASIC Clouds-Bitcoin Cryptocurrency Clouds and Tensor Processing Clouds-discuss their incentives, the empowering technologies and how they benefit from the specialized ASICs. Their business models, architectures and deployment methods are useful for envisioning future potential ASIC Clouds and forecasting how they will transform computing, the economy and society.

[1]  Michael Bedford Taylor,et al.  A Landscape of the New Dark Silicon Design Regime , 2013, IEEE Micro.

[2]  Stephen Richardson,et al.  Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era , 2016, IEEE Design & Test.

[3]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[4]  Junichiro Makino,et al.  GRAPE-8 -- An accelerator for gravitational N-body simulation with 20.5Gflops/W performance , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  William J. Dally,et al.  Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment , 2017, bioRxiv.

[7]  Lu Zhang,et al.  Moonwalk: NRE Optimization in ASIC Clouds , 2017, ASPLOS.

[8]  Michael Bedford Taylor,et al.  Bitcoin and the age of Bespoke Silicon , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[9]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[10]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[11]  Yu Wang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  Nicolas Courtois,et al.  Optimizing SHA256 in Bitcoin Mining , 2014, CSS.

[13]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[14]  A. Ailamaki,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[15]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[16]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[17]  Moein Khazraee,et al.  Specializing a Planet's Computation: ASIC Clouds , 2017, IEEE Micro.

[18]  David E. Shaw,et al.  Anton: A specialized ASIC for molecular dynamics , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[19]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[20]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[21]  S. Nakamoto,et al.  Bitcoin: A Peer-to-Peer Electronic Cash System , 2008 .

[22]  Eric S. Chung,et al.  A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[23]  Shaoli Liu,et al.  Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[24]  Hari Angepat,et al.  Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.

[25]  Lien-Fei Chen,et al.  18.6 A 0.5nJ/pixel 4K H.265/HEVC codec LSI for multi-format smartphone applications , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[26]  Ozcan Ozturk,et al.  Energy Efficient Architecture for Graph Analytics Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[27]  Engin Ipek,et al.  Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017, 2017 Fifth Berkeley Symposium on Energy Efficient Electronic Systems & Steep Transistors Workshop (E3S).

[28]  Wenguang Chen,et al.  NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Thomas F. Wenisch,et al.  Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.

[30]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[31]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[32]  Lien-Fei Chen,et al.  A 0.5 nJ/Pixel 4 K H.265/HEVC Codec LSI for Multi-Format Smartphone Applications , 2016, IEEE Journal of Solid-State Circuits.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[35]  Michael Bedford Taylor,et al.  The Evolution of Bitcoin Hardware , 2017, Computer.

[36]  Shaoli Liu,et al.  Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[37]  L. V. Gutierrez,et al.  ASIC Clouds: Specializing the Datacenter , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[38]  David Blaauw,et al.  GenAx: A Genome Sequencing Accelerator , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[39]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[40]  J. P. Grossman,et al.  The ANTON 2 chip a second-generation ASIC for molecular dynamics , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[41]  Monty Denneau,et al.  The GF11 supercomputer , 1985, ISCA '85.

[42]  Ronald G. Dreslinski,et al.  Integrated 3D-stacked server designs for increasing physical density of key-value stores , 2014, ASPLOS.

[43]  Ravi Iyengar,et al.  Goldstrike 1: Cointerra's first generation crypto-currency processor for bitcoin mining machines , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[44]  Heiner Litz,et al.  High Frequency Trading Acceleration Using FPGAs , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[45]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[46]  Karthik Gururaj,et al.  Accelerate Genomics Research with the Broad-Intel Genomics , 2017 .

[47]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS 2010.

[48]  Babak Falsafi,et al.  Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[49]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).