Towards a Learning-Based Performance Modeling for Accelerating Deep Neural Networks
暂无分享,去创建一个
[1] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Flavio Vella,et al. On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond , 2018, ACM Trans. Archit. Code Optim..
[3] Manuela M. Veloso,et al. Learning to Predict Performance from Formula Modeling and Training Data , 2000, ICML.
[4] M. E. Maron,et al. Automatic Indexing: An Experimental Inquiry , 1961, JACM.
[5] David A. Landgrebe,et al. A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..
[6] Wu-chun Feng,et al. Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[7] Andrea Formisano,et al. Accelerating Energy Games Solvers on Modern Architectures , 2017, IA3@SC.
[8] David Gregg,et al. Parallel Multi Channel convolution using General Matrix Multiplication , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[9] Daniel Brand,et al. MEC: Memory-efficient Convolution for Deep Neural Network , 2017, ICML.
[10] Torsten Hoefler,et al. Transparent Caching for RMA Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[11] Tianqi Chen,et al. Optimizing Deep Learning Workloads on ARM GPU with TVM , 2018, ReQuEST@ASPLOS.
[12] Massimo Bernaschi,et al. Multilevel Parallelism for the Exploration of Large-Scale Graphs , 2018, IEEE Transactions on Multi-Scale Computing Systems.
[13] Osvaldo Gervasi,et al. A Simulation Framework for Efficient Resource Management on Hybrid Systems , 2015, 2015 IEEE 18th International Conference on Computational Science and Engineering.
[14] Ben H. H. Juurlink,et al. Autotuning Stencil Computations with Structural Ordinal Regression Learning , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[15] Flavio Vella,et al. Multi-objective autotuning of MobileNets across the full software/hardware stack , 2018, ReQuEST@ASPLOS.
[16] Olivier Temam,et al. Collective optimization: A practical collaborative approach , 2010, TACO.
[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[18] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[19] Anne C. Elster,et al. Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[20] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[21] Massimo Bernaschi,et al. Dynamic Merging of Frontiers for Accelerating the Evaluation of Betweenness Centrality , 2018, ACM J. Exp. Algorithmics.
[22] Cedric Nugteren,et al. CLTune: A Generic Auto-Tuner for OpenCL Kernels , 2015, 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip.