Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks
暂无分享,去创建一个
Srinivas Sridharan | Christina Delimitrou | Shengbao Zheng | Mingyu Liang | Louis Feng | P. Panakanti | Wenyin Fu | Zhongyi Lin
[1] Christina Delimitrou,et al. Ditto: End-to-End Application Cloning for Networked Cloud Services , 2023, International Conference on Architectural Support for Programming Languages and Operating Systems.
[2] E. K. Ardestani,et al. Building a Performance Model for Deep Learning Recommendation Model Training on GPUs , 2022, 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC).
[3] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[4] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[5] Carole-Jean Wu,et al. Sustainable AI: Environmental Implications, Challenges and Opportunities , 2021, MLSys.
[6] Muhammet Mustafa Ozdal,et al. Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product , 2021, ISCA.
[7] Javier Duarte,et al. MLPerf Tiny Benchmark , 2021, NeurIPS Datasets and Benchmarks.
[8] Christina Delimitrou,et al. Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices , 2020 .
[9] Ajay Joshi,et al. AI Tax in Mobile SoCs: End-to-end Performance Analysis of Machine Learning in Smartphones , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[10] Doe Hyun Yoon,et al. The Design Process for Google's Training Chips: TPUv2 and TPUv3 , 2021, IEEE Micro.
[11] Shih-Hao Hung,et al. PerfNetRT: Platform-Aware Performance Modeling for Optimized Deep Neural Networks , 2020, 2020 International Computer Symposium (ICS).
[12] Carole-Jean Wu,et al. Chasing Carbon: The Elusive Environmental Footprint of Computing , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[13] Carole-Jean Wu,et al. Cross-Stack Workload Characterization of Deep Recommendation Systems , 2020, 2020 IEEE International Symposium on Workload Characterization (IISWC).
[14] Mikko H. Lipasti,et al. MicroGrad: A Centralized Framework for Workload Cloning and Stress Testing , 2020, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[15] Ramesh Radhakrishnan,et al. Demystifying the MLPerf Training Benchmark Suite , 2020, 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[16] Amar Phanishayee,et al. Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training , 2020, USENIX Annual Technical Conference.
[17] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[18] Shijian Li,et al. Characterizing and Modeling Distributed Training with Transient Cloud GPU Servers , 2020, 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS).
[19] Carole-Jean Wu,et al. MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance , 2020, IEEE Micro.
[20] Ankit Patel,et al. Missing the Forest for the Trees: End-to-End AI Application Performance in Edge Data Centers , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Cody Coleman,et al. MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[22] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[23] Cody A. Coleman,et al. MLPerf Training Benchmark , 2019, MLSys.
[24] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[25] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[26] Joseph McMahan,et al. Safer Program Behavior Sharing Through Trace Wringing , 2019, ASPLOS.
[27] Yuan He,et al. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.
[28] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[29] A. Stephen McGough,et al. Predicting the Computational Cost of Deep Learning Models , 2018, 2018 IEEE International Conference on Big Data (Big Data).
[30] Tor M. Aamodt,et al. Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling , 2018, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[31] Amar Phanishayee,et al. Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[32] Christina Delimitrou,et al. The Architectural Implications of Cloud Microservices , 2018, IEEE Computer Architecture Letters.
[33] Reena Panda,et al. CAMP: Accurate modeling of core and memory locality for proxy generation of big-data applications , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[34] Guorui Zhou,et al. Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.
[35] Reena Panda,et al. Statistical pattern based modeling of GPU memory access streams , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[36] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[37] Yan Solihin,et al. Clone morphing: Creating new workload behavior from existing applications , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[38] Reena Panda,et al. Proxy Benchmarks for Emerging Big-Data Workloads , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[39] Gu-Yeon Wei,et al. Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[40] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[41] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Hai Jin,et al. GPGPU-MiniBench: Accelerating GPGPU Micro-Architecture Simulation , 2015, IEEE Transactions on Computers.
[43] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Lizy Kurian John,et al. Automatic Generation of Miniaturized Synthetic Proxies for Target Applications to Efficiently Design Multicore Processors , 2014, IEEE Transactions on Computers.
[45] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[46] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[47] Christina Delimitrou,et al. ECHO: Recreating network traffic maps for datacenters with tens of thousands of servers , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[48] Lieven Eeckhout,et al. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[49] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[50] Lieven Eeckhout,et al. Dispersing proprietary applications as benchmarks through code mutation , 2008, ASPLOS.
[51] Carlos González,et al. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[52] Gennady Pekhimenko,et al. Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach , 2021, ArXiv.
[53] S. Sagar Imambi,et al. PyTorch , 2021, Programming with TensorFlow.
[54] 冯利芳. Facebook , 2020, The SAGE International Encyclopedia of Mass Media and Society.
[55] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[56] Kunle Olukotun,et al. DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .
[57] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.