On optimizing machine learning workloads via kernel fusion
暂无分享,去创建一个
Shirish Tatikonda | Berthold Reinwald | P. Sadayappan | Matthias Boehm | Arash Ashari | John Keenleyside | Keith Campbell | B. Reinwald | S. Tatikonda | Matthias Boehm | P. Sadayappan | Arash Ashari | Keith Campbell | John Keenleyside
[1] Toby Sharp,et al. Implementing Decision Trees and Forests on a GPU , 2008, ECCV.
[2] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[3] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..
[4] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.
[5] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[6] Shirish Tatikonda,et al. SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.
[7] B. Ribeiro,et al. GPUMLib : An Efficient Open-Source GPU Machine Learning Library , 2011 .
[8] Noel Lopes,et al. GPUMLib: A new Library to combine Machine Learning algorithms with Graphics Processing Units , 2010, 2010 10th International Conference on Hybrid Intelligent Systems.
[9] John Canny,et al. BIDMach: Large-scale Learning with Zero Memory Allocation , 2013 .
[10] Roy H. Campbell,et al. A Parallel Implementation of K-Means Clustering on GPUs , 2008, PDPTA.
[11] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[12] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[13] Olivier Chapelle,et al. Training a Support Vector Machine in the Primal , 2007, Neural Computation.
[14] Razvan Pascanu,et al. Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.
[15] Chia-Hua Ho,et al. Large-scale linear support vector regression , 2012, J. Mach. Learn. Res..
[16] Chih-Jen Lin,et al. Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..
[17] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[18] Christopher Ré,et al. Materialization optimizations for feature selection workloads , 2014, SIGMOD Conference.
[19] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[20] Shirish Tatikonda,et al. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML , 2014, Proc. VLDB Endow..
[21] John F. Canny,et al. Big data analytics with small footprint: squaring the cloud , 2013, KDD.
[22] Kurt Keutzer,et al. Fast support vector machine training and classification on graphics processors , 2008, ICML '08.
[23] P. Baldi,et al. Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.