Scale-Out Acceleration for Machine Learning
暂无分享,去创建一个
Hadi Esmaeilzadeh | Jongse Park | Divya Mahajan | Hardik Sharma | Joon Kyung Kim | Preston Olds | H. Esmaeilzadeh | Jongse Park | Hardik Sharma | J. Kim | Divya Mahajan | Preston Olds
[1] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[2] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Ioannis Kompatsiaris,et al. GPU acceleration for support vector machines , 2011, WIAMIS 2011.
[4] Jacob Nelson,et al. SNNAP: Approximate computing on programmable SoCs via neural acceleration , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[5] Eric S. Chung,et al. LINQits: big data on little clients , 2013, ISCA.
[6] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[7] Wei Zhang,et al. Melia: A MapReduce Framework on OpenCL-Based FPGAs , 2016, IEEE Transactions on Parallel and Distributed Systems.
[8] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[9] Giovanni De Micheli,et al. High Level Synthesis of ASlCs un - der Timing and Synchronization Constraints , 1992 .
[10] Natalia Gimelshein,et al. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Sayan Mukherjee,et al. Feature Selection for SVMs , 2000, NIPS.
[12] Viktor K. Prasanna,et al. A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[13] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[15] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[16] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[17] Pradeep Dubey,et al. Distributed Deep Learning Using Synchronous Stochastic Gradient Descent , 2016, ArXiv.
[18] Kam D. Dahlquist,et al. Regression Approaches for Microarray Data Analysis , 2002, J. Comput. Biol..
[19] Jason Cong,et al. Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale , 2016, SoCC.
[20] Kunle Olukotun,et al. Delite , 2014, ACM Trans. Embed. Comput. Syst..
[21] Bin Zhou,et al. High Frequency Data and Volatility in Foreign Exchange Rates , 2013 .
[22] Jorge Nocedal,et al. Sample size selection in optimization methods for machine learning , 2012, Math. Program..
[23] Srihari Cadambi,et al. An Energy-Efficient Heterogeneous System for Embedded Learning and Classification , 2011, IEEE Embedded Systems Letters.
[24] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[25] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[26] Bertil Schmidt,et al. MPI-HMMER-Boost: Distributed FPGA Acceleration , 2007, J. VLSI Signal Process..
[27] Steve Poole,et al. An Implementation of the Conjugate Gradient Algorithm on FPGAs , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.
[28] John Langford,et al. Slow Learners are Fast , 2009, NIPS.
[29] Joel Praveen Pinto,et al. Multilayer Perceptron Based Hierarchical Acoustic Modeling for Automatic Speech Recognition , 2010 .
[30] Srihari Cadambi,et al. A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.
[31] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.
[32] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[33] Srihari Cadambi,et al. A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification , 2012, TACO.
[34] Christopher Ré,et al. Towards a unified architecture for in-RDBMS analytics , 2012, SIGMOD Conference.
[35] Huseyin Seker,et al. FPGA implementation of K-means algorithm for bioinformatics application: An accelerated approach to clustering Microarray data , 2011, 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).
[36] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[37] Henry Hoffmann,et al. Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.
[38] Xuehai Zhou,et al. PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[39] Hadi Esmaeilzadeh,et al. TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[40] Berin Martini,et al. Large-Scale FPGA-based Convolutional Networks , 2011 .
[41] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[42] Hari Angepat,et al. A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[43] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[44] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[45] Tsutomu Maruyama. Real-time K-Means Clustering for Color Images on Reconfigurable Hardware , 2006, 18th International Conference on Pattern Recognition (ICPR'06).
[46] Berin Martini,et al. NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.
[47] Elias S. Manolakos,et al. Parallel architectures for the kNN classifier -- design of soft IP cores and FPGA implementations , 2013, TECS.
[48] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[49] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[50] Avleen Singh Bijral,et al. Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.
[51] John Wawrzynek,et al. High Level Synthesis with a Dataflow Architectural Template , 2016, ArXiv.
[52] E. Lander,et al. Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.
[53] Ranga Vemuri,et al. An Integrated Partitioning and Synthesis System for Dynamically Reconfigurable Multi-FPGA Architectures , 1998, IPPS/SPDP Workshops.
[54] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[55] Ulf Lorenz,et al. Parallel Brutus: the first distributed, FPGA accelerated chess program , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[56] Dong Yu,et al. On parallelizability of stochastic gradient descent for speech DNNS , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[57] Wenguang Chen,et al. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[58] Christoforos Kachris,et al. High-level synthesizable dataflow MapReduce accelerator for FPGA-coupled data centers , 2015, 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).
[59] Rakesh Kumar,et al. A hardware acceleration technique for gradient descent and conjugate gradient , 2011, 2011 IEEE 9th Symposium on Application Specific Processors (SASP).
[60] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.
[61] Christos-Savvas Bouganis,et al. A Heterogeneous FPGA Architecture for Support Vector Machine Training , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.
[62] Jason Cong,et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[63] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[64] Tsvi Kuflik,et al. Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011) : 27th October 2011, Chicago, IL, USA , 2011 .
[65] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[66] Abel G. Silva-Filho,et al. Hyperspectral images clustering on reconfigurable hardware using the k-means algorithm , 2003, 16th Symposium on Integrated Circuits and Systems Design, 2003. SBCCI 2003. Proceedings..
[67] Elias S. Manolakos,et al. IP-cores design for the kNN classifier , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.
[68] V. S. Kumari Roshni,et al. Comparison of various texture classification methods using multiresolution analysis and linear regression modelling , 2016, SpringerPlus.
[69] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[70] George A. Constantinides,et al. A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices , 2010, TRETS.
[71] Susan J. Eggers,et al. CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures , 2008, FPL.
[72] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.