暂无分享,去创建一个
Carole-Jean Wu | Özgür Özkan | Zhuoran Zhao | Shin-Yeh Tsai | Mark Hempstead | Yavuz Yetim | Michael Lui | Carole-Jean Wu | Yavuz Yetim | Shin-Yeh Tsai | Zhuoran Zhao | Mark Hempstead | Michael Lui | Özgür Özkan
[1] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.
[2] Newsha Ardalani,et al. Beyond human-level accuracy: computational challenges in deep learning , 2019, PPoPP.
[3] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[4] Carole-Jean Wu,et al. Cross-Stack Workload Characterization of Deep Recommendation Systems , 2020, 2020 IEEE International Symposium on Workload Characterization (IISWC).
[5] Yehuda Koren,et al. Matrix Factorization Techniques for Recommender Systems , 2009, Computer.
[6] Minsoo Rhu,et al. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[7] Developing a Recommendation Benchmark for MLPerf Training and Inference , 2020, ArXiv.
[8] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[9] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[10] Guorui Zhou,et al. Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.
[11] Jichuan Chang,et al. Software-Defined Far Memory in Warehouse-Scale Computers , 2019, ASPLOS.
[12] Cody Coleman,et al. MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[13] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[14] Carole-Jean Wu,et al. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[15] Vikas Raunak. Simple and Effective Dimensionality Reduction for Word Embeddings , 2017 .
[16] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[17] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.
[18] Christopher Olston,et al. TensorFlow-Serving: Flexible, High-Performance ML Serving , 2017, ArXiv.
[19] Jingyuan Zhang,et al. AIBox: CTR Prediction Model Training on a Single Node , 2019, CIKM.
[20] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[21] Martin D. Schatz,et al. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[22] Zi Yin,et al. On the Dimensionality of Word Embedding , 2018, NeurIPS.
[23] Yunming Ye,et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.
[24] Kaushik Veeraraghavan,et al. Canopy: An End-to-End Performance Tracing And Analysis System , 2017, SOSP.
[25] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[26] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[27] Bracha Shapira,et al. Recommender Systems Handbook , 2015, Springer US.
[28] David Patterson,et al. MLPerf Training Benchmark , 2019, MLSys.
[29] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .
[30] Pramod Viswanath,et al. All-but-the-Top: Simple and Effective Postprocessing for Word Representations , 2017, ICLR.
[31] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[32] Jure Leskovec,et al. Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.
[33] Onur Mutlu,et al. Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[34] David A. Wood,et al. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .
[35] Ed H. Chi,et al. Factorized Deep Retrieval and Distributed TensorFlow Serving , 2018 .
[36] Sachin Katti,et al. Bandana: Using Non-volatile Memory for Storing Deep Learning Models , 2018, MLSys.
[37] Bor-Yiing Su,et al. Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems , 2020, ArXiv.
[38] Ping Li,et al. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems , 2020, MLSys.
[39] Valentin Khrulkov,et al. Tensorized Embedding Layers for Efficient Model Compression , 2019, ArXiv.
[40] Levent Sagun,et al. Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[41] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[42] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[43] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[44] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[45] Trevor N. Mudge,et al. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.
[46] Yang Yang,et al. Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.
[47] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[48] Donald Beaver,et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .
[49] Hideki Nakayama,et al. Compressing Word Embeddings via Deep Compositional Code Learning , 2017, ICLR.
[50] Minsoo Rhu,et al. Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).