SynergyFlow
暂无分享,去创建一个
Jiajun Li | Wenyan Lu | Xiaowei Li | Jingya Wu | Guihai Yan | Shijun Gong | Shuhao Jiang | Xiaowei Li | Wenyan Lu | Guihai Yan | Jiajun Li | Shijun Gong | Shuhao Jiang | Jingya Wu
[1] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[3] Wenhao Huang,et al. Deep process neural network for temporal deep learning , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[4] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[5] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[6] Keqin Li. Optimal partitioning of a multicore server processor , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[7] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.
[8] Samuel Williams,et al. Auto-tuning performance on multicore computers , 2008 .
[9] Yoshua Bengio,et al. An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.
[10] Xin He,et al. NNest: Early-Stage Design Space Exploration Tool for Neural Network Inference Accelerators , 2018, ISLPED.
[11] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[12] Srihari Cadambi,et al. A programmable parallel accelerator for learning and classification , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[13] Xuan Zhang,et al. Joint Design of Training and Hardware Towards Efficient and Accuracy-Scalable Neural Network Inference , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[14] Peter J. Haas,et al. Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.
[15] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[16] Xiaowei Li,et al. An Analytical Framework for Estimating Scale-Out and Scale-Up Power Efficiency of Heterogeneous Manycores , 2016, IEEE Transactions on Computers.
[17] Henk Corporaal,et al. Memory-centric accelerator design for Convolutional Neural Networks , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).
[18] Keqin Li. Optimal Partitioning of a Multicore Server Processor , 2012, IPDPS Workshops.
[19] Srihari Cadambi,et al. A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.
[20] Yu Cao,et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.
[21] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[22] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[23] Leibo Liu,et al. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[24] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[25] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.
[26] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[27] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[28] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[29] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[30] Geoffrey E. Hinton,et al. Learning to Label Aerial Images from Noisy Data , 2012, ICML.
[31] Yann LeCun,et al. CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[32] Harry Dwyer,et al. An out-of-order superscalar processor with speculative execution and fast, precise interrupts , 1992, MICRO 25.
[33] Xuehai Zhou,et al. PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[34] Xin He,et al. AxTrain: Hardware-Oriented Neural Network Training for Approximate Inference , 2018, ISLPED.
[35] Samuel Williams,et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .
[36] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[37] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[39] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[40] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[41] Hoi-Jun Yoo,et al. 14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).
[42] Richard W. Vuduc,et al. Balance Principles for Algorithm-Architecture Co-Design , 2011, HotPar.
[43] Zhen Lin,et al. Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems , 2014, APSys.
[44] Scott A. Mahlke,et al. Bridging the computation gap between programmable processors and hardwired accelerators , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[45] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[46] Yu Wang,et al. Training itself: Mixed-signal training acceleration for memristor-based neural network , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).
[47] Jiajun Li,et al. SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[48] Xiao-Wei Li,et al. CCR: A concise convolution rule for sparse neural network accelerators , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[49] Srihari Cadambi,et al. A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.
[50] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[51] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[52] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[53] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[54] Tara N. Sainath,et al. Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[55] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.