暂无分享,去创建一个
Shuai Zhang | Gustavo Alonso | Jingren Zhou | Yong Li | Kai Zeng | Ce Zhang | Thomas B. Preußer | Liang Feng | Wenqi Jiang | Jiansong Zhang | Zhenhao He | Thomas B. Preusser | Tongxuan Liu | G. Alonso | Jingren Zhou | Kai Zeng | Wenqi Jiang | Zhen He | Shuai Zhang | Liang Feng | Jiansong Zhang | Tongxuan Liu | Yong Li | Ce Zhang
[1] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[2] Joseph K. Bradley,et al. Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale , 2016, NIPS.
[3] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[4] Guangwen Yang,et al. F-CNN: An FPGA-based framework for training Convolutional Neural Networks , 2016, 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[5] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.
[6] Heng-Tze Cheng,et al. Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.
[7] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.
[8] C. Gomez-Uribe,et al. The Netflix Recommender System: Algorithms, Business Value, and Innovation , 2016, ACM Trans. Manag. Inf. Syst..
[9] Olatunji Ruwase,et al. Optimizing CNNs on Multicores for Scalability, Performance and Goodput , 2017, ASPLOS.
[10] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[11] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[12] Gustavo Alonso,et al. Scalable inference of decision tree ensembles: Flexible design for CPU-FPGA platforms , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[13] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[14] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[15] V. Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.
[16] Tat-Seng Chua,et al. Neural Collaborative Filtering , 2017, WWW.
[17] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.
[18] Gustavo Alonso,et al. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[19] Christopher Olston,et al. TensorFlow-Serving: Flexible, High-Performance ML Serving , 2017, ArXiv.
[20] Keith Kim,et al. HBM (High Bandwidth Memory) DRAM Technology and Architecture , 2017, 2017 IEEE International Memory Workshop (IMW).
[21] Ioannis Mitliagkas,et al. Deep Learning at 15PF : Supervised and Semi-Supervised Classification for Scientific Data , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Amar Phanishayee,et al. Accelerating Deep Learning Workloads Through Efficient Multi-Model Execution , 2018 .
[23] InferLine: ML Inference Pipeline Composition Framework , 2018, ArXiv.
[24] Minsik Cho. BlueConnect: Novel Hierarchical All-Reduce on Multi-tired Network for Deep Learning , 2018 .
[25] Gustavo Alonso,et al. A Flexible K-Means Operator for Hybrid Databases , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[26] Hari Angepat,et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.
[27] Paramvir Bahl,et al. Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.
[28] Martin D. Schatz,et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.
[29] Farinaz Koushanfar,et al. ReBNet: Residual Binarized Neural Network , 2017, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[30] Dimitris S. Papailiopoulos,et al. The Effect of Network Width on the Performance of Large-batch Training , 2018, NeurIPS.
[31] Guorui Zhou,et al. Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.
[32] Noam Shazeer,et al. HydraNets: Specialized Dynamic Architectures for Efficient Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[33] Hadi Esmaeilzadeh,et al. ReLeQ: An Automatic Reinforcement Learning Approach for Deep Quantization of Neural Networks , 2018 .
[34] Pradeep Dubey,et al. Mixed Precision Training of Convolutional Neural Networks using Integer Operations , 2018, ICLR.
[35] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .
[36] Michael I. Jordan,et al. Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.
[37] Jie Liu,et al. Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours , 2019, ECML/PKDD.
[38] Zhiru Zhang,et al. Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating , 2019, MICRO.
[39] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[40] Carole-Jean Wu,et al. Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[41] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[42] Chang Zhou,et al. Deep Interest Evolution Network for Click-Through Rate Prediction , 2018, AAAI.
[43] Rudy Lauwereins,et al. Sub-Word Parallel Precision-Scalable MAC Engines for Efficient Embedded DNN Inference , 2019, 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS).
[44] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[45] Gustavo Alonso,et al. Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning , 2019, Proc. VLDB Endow..
[46] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[47] Yiran Chen,et al. MobiEye: An Efficient Cloud-based Video Detection System for Real-Time Mobile Applications , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[48] William J. Dally,et al. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture , 2019, MICRO.
[49] Gennady Pekhimenko,et al. Priority-based Parameter Propagation for Distributed DNN Training , 2019, SysML.
[50] Minsoo Rhu,et al. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.
[51] Dylan Malone Stuart,et al. Laconic Deep Learning Inference Acceleration , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[52] Li Wei,et al. Recommending what video to watch next: a multitask ranking system , 2019, RecSys.
[53] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[54] Yuhao Zhu,et al. ASV: Accelerated Stereo Vision System , 2019, MICRO.
[55] Chaojian Li,et al. HALO: Hardware-Aware Learning to Optimize , 2020, ECCV.
[56] Kartikeya Bhardwaj,et al. A Hardware Prototype Targeting Distributed Deep Learning for On-device Inference , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[57] Vinod Kathail,et al. Xilinx Vitis Unified Software Platform , 2020, FPGA.
[58] Jie Zhang,et al. Benchmarking High Bandwidth Memory on FPGAs , 2020, ArXiv.
[59] Nezihe Merve Gurel,et al. Compressive Sensing Using Iterative Hard Thresholding With Low Precision Data Representation: Theory and Applications , 2020, IEEE Transactions on Signal Processing.
[60] Gustavo Alonso,et al. Making Search Engines Faster by Lowering the Cost of Querying Business Rules Through FPGAs , 2020, SIGMOD Conference.
[61] Wen-mei W. Hwu,et al. SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems , 2019, MLSys.
[62] Martin D. Schatz,et al. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[63] Carole-Jean Wu,et al. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[64] Yujeong Choi,et al. PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[65] Gustavo Alonso,et al. BiS-KM: Enabling Any-Precision K-Means on FPGAs , 2020, FPGA.
[66] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[67] Tao Zhang,et al. EFLOPS: Algorithm and System Co-Design for a High Performance Distributed Training Platform , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[68] Minsoo Rhu,et al. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[69] Torsten Hoefler,et al. Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis , 2019, FPGA.