In-datacenter performance analysis of a tensor processing unit
暂无分享,去创建一个
David A. Patterson | Norman P. Jouppi | Kyle Lucke | James Law | Jeffrey Dean | Doe Hyun Yoon | Eric Wilcox | Naveen Kumar | Dan Steinberg | Cliff Young | Horia Toma | James Laudon | Robert Hundt | Julian Ibarz | Thomas Norrie | Gregory Sizikov | Steve Lacy | Emad Samadiani | Richard Walter | Robert Hagmann | Mark Omernick | Gaurav Agrawal | Rick Boyle | Vijay Vasudevan | Bo Tian | Nishant Patil | Adriana Maggiore | Matt Ross | Daniel Killebrew | Andy Phelps | Alexander Kaplan | Kieran Miller | Ravi Narayanaswami | John Hu | Raminder Bajwa | Sarah Bates | Suresh Bhatia | Nan Boden | Al Borchers | Pierre-luc Cantin | Clifford Chao | Chris Clark | Jeremy Coriell | Mike Daley | Matt Dau | Ben Gelb | Tara Vazir Ghaemmaghami | Rajendra Gottipati | William Gulland | C. Richard Ho | Doug Hogberg | Dan Hurt | Aaron Jaffey | Alek Jaworski | Harshit Khaitan | Andy Koch | Diemthu Le | Chris Leary | Zhuyuan Liu | Alan Lundin | Gordon MacKean | Maire Mahony | Rahul Nagarajan | Ray Ni | Kathy Nix | Narayana Penukonda | Jonathan Ross | Amir Salek | Chris Severn | Matthew Snelham | Jed Souter | Andy Swing | Mercedes Tan | Gregory Thorson | Erick Tuttle | Walter Wang | Kyle A. Lucke | J. Dean | Vijay Vasudevan | Naveen Kumar | Julian Ibarz | N. Jouppi | C. Young | Nishant Patil | David A. Patterson | Gaurav Agrawal | R. Bajwa | Sarah Bates | Suresh Bhatia | N. Boden | Al Borchers | Rick Boyle | Pierre-luc Cantin | Clifford Chao | Chris Clark | Jeremy Coriell | Mike Daley | Matt Dau | Ben Gelb | T. Ghaemmaghami | R. Gottipati | William Gulland | R. Hagmann | C. R. Ho | Doug Hogberg | John Hu | R. Hundt | D. Hurt | A. Jaffey | Alek Jaworski | Alexander Kaplan | Harshit Khaitan | Daniel Killebrew | A. Koch | Steve Lacy | J. Laudon | James Law | Diemthu Le | Chris Leary | Zhuyuan Liu | Alan Lundin | G. MacKean | A. Maggiore | Maire Mahony | K. Miller | R. Nagarajan | Ravi Narayanaswami | Ray Ni | K. Nix | Thomas Norrie | Mark Omernick | Narayana Penukonda | A. Phelps | Jonathan Ross | Matt Ross | Amir Salek | E. Samadiani | C. Severn | G. Sizikov | Matthew Snelham | Jed Souter | D. Steinberg | Andy Swing | Mercedes Tan | G. Thorson | Bo Tian | H. Toma | Erick Tuttle | Richard Walter | Walter Wang | Eric Wilcox | D. Yoon | M. Mahony | Andy Phelps | Gregory Thorson | Taraneh Ghaemmaghami | Horia Toma | DONG-HYUN Hwang | Norman P. Jouppi | David Patterson | Gaurav Agrawal | Raminder Bajwa | Sarah Bates | Suresh Bhatia | Nan Boden | Al Borchers | Rick Boyle | Pierre-luc Cantin | Clifford Chao | Chris Clark | Jeremy Coriell | Mike Daley | Matt Dau | Jeffrey Dean | Ben Gelb | Rajendra Gottipati | William Gulland | Robert Hagmann | C. Richard Ho | Doug Hogberg | John Hu | Dan Hurt | Julian Ibarz | Alek Jaworski | Alexander Kaplan | Harshit Khaitan | Andy Koch | Naveen Kumar | Steve Lacy | James Laudon | James Law | Diemthu Le | Chris Leary | Zhuyuan Liu | Kyle Lucke | Alan Lundin | Gordon MacKean | Adriana Maggiore | Maire Mahony | Kieran Miller | Ray Ni | Kathy Nix | Andy Phelps | Jonathan Ross | Matt Ross | Amir Salek | Emad Samadiani | Chris Severn | Gregory Sizikov | Matthew Snelham | Jed Souter | Dan Steinberg | Mercedes Tan | Gregory Thorson | Bo Tian | Horia Toma | Erick Tuttle | Vijay Vasudevan | Richard Walter | Walter Wang | Eric Wilcox | Doe Hyun Yoon
[1] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[2] David A. Patterson,et al. The case for the reduced instruction set computer , 1980, CARN.
[3] H. T. Kung. Why systolic architectures? , 1982, Computer.
[4] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.
[5] D. Hammerstrom,et al. A VLSI architecture for high-performance, low-cost, on-chip learning , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[6] J. Beichter,et al. Design of a 1st Generation Neurocomputer , 1991 .
[7] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .
[8] Paolo Ienne,et al. Special-purpose digital hardware for neural networks: An architectural survey , 1996, J. VLSI Signal Process..
[9] Michele Ruggiero Banish,et al. Neural network processor , 2004, SPIE Optics + Photonics.
[10] David A. Patterson,et al. Latency lags bandwith , 2004, CACM.
[11] Luiz André Barroso,et al. The Case for Energy-Proportional Computing , 2007, Computer.
[12] Yann LeCun,et al. CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[13] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[14] Klaus-Dieter Lange,et al. Identifying Shades of Green: The SPECpower Benchmarks , 2009, Computer.
[15] Srihari Cadambi,et al. A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.
[16] Berin Martini,et al. NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.
[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[18] Henk Corporaal,et al. Memory-centric accelerator design for Convolutional Neural Networks , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).
[19] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[20] Christoforos E. Kozyrakis,et al. Convolution engine: balancing efficiency & flexibility in specialized computing , 2013, ISCA.
[21] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[22] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[23] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[24] Karin Strauss,et al. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .
[25] Xuehai Zhou,et al. PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[26] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[27] Karin Strauss,et al. Toward accelerating deep learning at scale using specialized hardware in the datacenter , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[28] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[29] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[30] PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[31] Luca Benini,et al. Origami: A Convolutional Network Accelerator , 2015, ACM Great Lakes Symposium on VLSI.
[32] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Eric S. Chung,et al. A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[34] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[35] 雷吉纳尔德·克利福德·扬. Batch processing in a neural network processor , 2016 .
[36] Jeffrey Dean,et al. Large-Scale Deep Learning For Building Intelligent Computer Systems , 2016, WSDM.
[37] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[38] 格雷戈里·米歇尔·索尔森,et al. Vector computation unit in a neural network processor , 2016 .
[39] Yu Wang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[40] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[41] Kurt Keutzer,et al. If I could only design one circuit ...: technical perspective , 2016, Communications of the ACM.
[42] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[43] Lin Zhong,et al. RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[44] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[45] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[46] Hari Angepat,et al. A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[47] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[48] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[49] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[50] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[51] Gu-Yeon Wei,et al. Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[52] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[53] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[54] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[55] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[56] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[57] Kurt Keutzer. Technical Perspective: If I could only design one circuit … , 2016 .
[58] Krste Asanovi´c. Programmable Neurocomputing , .
[59] Wang,et al. In-Datacenter Performance Analysis of a Tensor Processing UnitTM , .