DaDianNao: A Machine-Learning Supercomputer
暂无分享,去创建一个
Jia Wang | Tianshi Chen | Zhiwei Xu | Olivier Temam | Ling Li | Shaoli Liu | Ninghui Sun | Liqiang He | Yunji Chen | Tao Luo | Shijin Zhang | Tianshi Chen | Ninghui Sun | Jia Wang | Yunji Chen | O. Temam | Tao Luo | Shaoli Liu | Shijin Zhang | Liqiang He | Ling Li | Zhiwei Xu
[1] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[2] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[3] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[4] Richard E. Matick,et al. Logic-based eDRAM: Origins and rationale for use , 2005, IBM J. Res. Dev..
[5] P. K. Dubey,et al. Recognition, Mining and Synthesis Moves Comp uters to the Era of Tera , 2005 .
[6] Yoshua Bengio,et al. An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.
[7] David E. Shaw,et al. Anton: A specialized ASIC for molecular dynamics , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[8] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[9] Johannes Schemmel,et al. Wafer-scale integration of analog neural networks , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).
[10] Luis A. Plana,et al. SpiNNaker: Mapping neural networks onto a massively-parallel chip multiprocessor , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).
[11] Scott A. Mahlke,et al. Bridging the computation gap between programmable processors and hardwired accelerators , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[12] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[13] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[14] Yann LeCun,et al. Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.
[15] K. McStay,et al. Scaling deep trench based eDRAM on SOI to 32nm and Beyond , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).
[16] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[17] Steven Swanson,et al. QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Berin Martini,et al. NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.
[19] Dharmendra S. Modha,et al. A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).
[20] Luca Maria Gambardella,et al. Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.
[21] Mikko H. Lipasti,et al. A case for neuromorphic ISAs , 2011, ASPLOS XVI.
[22] S. Natarajan,et al. A high-performance, high-density 28nm eDRAM technology with high-K/metal-gate , 2011, 2011 International Electron Devices Meeting.
[23] Mikko H. Lipasti,et al. Automatic abstraction and fault tolerance in cortical microachitectures , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[24] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[25] Andrew B. Kahng,et al. ORION 2.0: A Power-Area Simulator for Interconnection Networks , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[26] Yasuhisa Shimazaki,et al. A 0.41µA standby leakage 32Kb embedded SRAM with Low-Voltage resume-standby utilizing all digital current comparator in 28nm HKMG CMOS , 2012, 2012 Symposium on VLSI Circuits (VLSIC).
[27] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[28] Olivier Temam,et al. A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[29] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[30] Srihari Cadambi,et al. A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification , 2012, TACO.
[31] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[33] David A. Ferrucci,et al. Introduction to "This is Watson" , 2012, IBM J. Res. Dev..
[34] Geoffrey E. Hinton,et al. Learning to Label Aerial Images from Noisy Data , 2012, ICML.
[35] Geoffrey E. Hinton,et al. An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.
[36] Nong Xiao,et al. Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support , 2012, IEEE Transactions on Computers.
[37] Burkhard D. Steinmacher-Burow,et al. The IBM Blue Gene/Q Interconnection Fabric , 2012, IEEE Micro.
[38] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[39] Tara N. Sainath,et al. Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[40] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[41] Zheng Li,et al. Continuous real-world inputs can open up alternative accelerator designs , 2013, ISCA.
[42] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[43] Christoforos E. Kozyrakis,et al. Convolution engine: balancing efficiency & flexibility in specialized computing , 2013, ISCA.
[44] Giuseppe Caire,et al. Compute-and-Forward Strategies for Cooperative Distributed Antenna Systems , 2012, IEEE Transactions on Information Theory.
[45] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.
[46] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.