暂无分享,去创建一个
Carole-Jean Wu | Kim M. Hazelwood | Xiaodong Wang | Kim Hazelwood | Bilge Acun | Matthew Murphy | Jade Nie | K. Hazelwood | Carole-Jean Wu | Bilge Acun | Xiaodong Wang | Jade Nie | Matthew Murphy
[1] Martin D. Schatz,et al. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[2] Minsoo Rhu,et al. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.
[3] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[4] Yuandong Tian,et al. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Tomas Mikolov,et al. Improving Supervised Bilingual Mapping of Word Embeddings , 2018, ArXiv.
[6] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[7] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.
[8] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[9] Carole-Jean Wu,et al. CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery , 2020, ArXiv.
[10] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[11] Zheng Shao,et al. Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.
[12] Developing a Recommendation Benchmark for MLPerf Training and Inference , 2020, ArXiv.
[13] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[14] Carole-Jean Wu,et al. Cross-Stack Workload Characterization of Deep Recommendation Systems , 2020, 2020 IEEE International Symposium on Workload Characterization (IISWC).
[15] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[16] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[17] Carole-Jean Wu,et al. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[18] Hervé Jégou,et al. Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion , 2018, EMNLP.
[19] Guorui Zhou,et al. Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.
[20] Ed H. Chi,et al. Factorized Deep Retrieval and Distributed TensorFlow Serving , 2018 .
[21] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[22] Cody Coleman,et al. MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[23] James Zou,et al. Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems , 2019, 2021 IEEE International Symposium on Information Theory (ISIT).
[24] Sachin Katti,et al. Bandana: Using Non-volatile Memory for Storing Deep Learning Models , 2018, MLSys.
[25] Chang Zhou,et al. Deep Interest Evolution Network for Click-Through Rate Prediction , 2018, AAAI.
[26] Heng-Tze Cheng,et al. Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.
[27] Bor-Yiing Su,et al. Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems , 2020, ArXiv.
[28] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[29] Ping Li,et al. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems , 2020, MLSys.
[30] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[31] Forrest N. Iandola,et al. How to scale distributed deep learning? , 2016, ArXiv.
[32] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[33] Erik R. Altman,et al. Predicting GPU Performance from CPU Runs Using Machine Learning , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.
[34] Carole-Jean Wu,et al. MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance , 2020, IEEE Micro.
[35] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[36] Franck Cappello,et al. DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models , 2020, 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID).
[37] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[38] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[39] Alexandros Karatzoglou,et al. Deep Learning for Recommender Systems , 2017, RecSys.
[40] David Patterson,et al. MLPerf Training Benchmark , 2019, MLSys.
[41] Xiaodong He,et al. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems , 2015, WWW.
[42] Takuya Akiba,et al. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.
[43] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[44] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.
[45] Yuandong Tian,et al. Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction , 2020, KDD.
[46] B. Karrer,et al. AE: A domain-agnostic platform for adaptive experimentation , 2018 .
[47] Carole-Jean Wu,et al. Understanding Capacity-Driven Scale-Out Neural Recommendation Inference , 2020, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[48] Eric P. Xing,et al. Fault Tolerance in Iterative-Convergent Machine Learning , 2018, ICML.
[49] Chinmay Hegde,et al. Collaborative Deep Learning in Fixed Topology Networks , 2017, NIPS.
[50] Xiaojin Zhu,et al. Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).