论文信息 - Motivation for and Evaluation of the First Tensor Processing Unit

Motivation for and Evaluation of the First Tensor Processing Unit

The first-generation tensor processing unit (TPU) runs deep neural network (DNN) inference 15-30 times faster with 30-80 times better energy efficiency than contemporary CPUs and GPUs in similar semiconductor technologies. This domain-specific architecture (DSA) is a custom chip that has been deployed in Google datacenters since 2015, where it serves billions of people.

[1] James E. Smith. Decoupled access/execute computer architectures , 1982, ISCA '98.

[2] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[3] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[4] Eric S. Chung,et al. A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[5] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[6] Ninghui Sun,et al. DianNao family , 2016, Commun. ACM.