暂无分享,去创建一个
Yinghai Lu | Martin D. Schatz | Michael Gschwind | Michael J. Anderson | Hector Yuen | Aravind Kalaiah | Peter Tang | Jongsoo Park | Summer Deng | Nadathur Satish | Olof Johansson | Narayanan Sundaram | Changkyu Kim | Garret Catron | Abhishek Dhanotia | Jordan Fix | Nick Gibson | Wenyin Fu | Avinash Nayak | Sam Naghshineh | et al. | Harsha Bojja | Aravind Anbudurai | Ying Zhang | Jason Liang | Shishir Juluri | Jaewon Lee | Adi Gangidi | Benny Chen | Stephen Chen | Haixin Liu | Jack Montgomery | Arun Moorthy | Chris Petersen | Martin Schatz | Bangsheng Tang | Amy Yang | Jiecao Yu | Vandana Balan | Joe Boyd | Matthew Breitbach | Claudio Caldato | Anna Calvo | Sneh Chandwani | Panos Christeas | Brad Cottel | Brian Coutinho | Arun Dalli | Oniel Duncan | Roman Dzhabarov | Simon Elmir | Chunli Fu | Michael Fulthorp | Sean Gordon | Beatriz Padilla Hernandez | Daniel Ho | Yu-Cheng Huang | Peter Tang | Jordan Fix | Summer Deng | N. Satish | Jongsoo Park | Y. Zhang | E. al. | Changkyu Kim | N. Sundaram | M. Gschwind | Jiecao Yu | Hector Yuen | A. Moorthy | O. Johansson | A. Nayak | A. Kalaiah | Adi Gangidi | S. Naghshineh | Bangsheng Tang | Jason Liang | Chunli Fu | Harsha Bojja | Jaewon Lee | Bradford Cottel | Yinghai Lu | A. Dhanotia | J. Boyd | Benny Chen | Stephen Chen | Haixin Liu | Jack Montgomery | Chris Petersen | A. Yang | Aravind Anbudurai | Vandana Balan | Matthew Breitbach | Claudio Caldato | Anna Calvo | Garret Catron | Sneha Chandwani | Panos Christeas | Brian Coutinho | Arun Dalli | Oniel Duncan | R. Dzhabarov | Simon Elmir | Wenyin Fu | Michael Fulthorp | N. Gibson | Sean Gordon | Daniel Ho | Yu-Cheng Huang | Shishir Juluri | Aravind Kalaiah | J. Liang
[1] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[2] Albert Gordo,et al. Rosetta: Large Scale System for Text Detection and Recognition in Images , 2018, KDD.
[3] David Patterson,et al. A domain-specific supercomputer for training deep neural networks , 2020, Commun. ACM.
[4] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[5] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[6] Xuehai Qian,et al. AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[7] Eriko Nurvitadhi,et al. Scalable Multi-FPGA Acceleration for Large RNNs with Full Parallelism Levels , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).
[8] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[9] Steffen Rendle,et al. Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.
[10] Joaquin Quiñonero Candela,et al. Practical Lessons from Predicting Clicks on Ads at Facebook , 2014, ADKDD'14.
[11] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[13] Minsoo Rhu,et al. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.
[14] Hari Angepat,et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.
[15] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[16] Dongup Kwon,et al. A Multi-Neural Network Acceleration Architecture , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[17] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[18] Vivienne Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2017, IEEE Journal of Solid-State Circuits.
[19] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[20] Qiang Liu,et al. Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data , 2020, ArXiv.
[21] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[22] Hyoukjun Kwon,et al. Heterogeneous Dataflow Accelerators for Multi-DNN Workloads , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[23] Yuandong Tian,et al. FBNetV3: Joint Architecture-Recipe Search using Neural Acquisition Function , 2020, ArXiv.
[24] Yoav Shoham,et al. The Cost of Training NLP Models: A Concise Overview , 2020, ArXiv.
[25] Carole-Jean Wu,et al. MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance , 2020, IEEE Micro.
[26] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.
[27] Sungroh Yoon,et al. Memory-Augmented Neural Networks on FPGA for Real-Time and Energy-Efficient Question Answering , 2021, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[28] Heng Wang,et al. Video Classification With Channel-Separated Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[31] Dipankar Das,et al. Manna: An Accelerator for Memory-Augmented Neural Networks , 2019, MICRO.
[32] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[33] Alexander Aiken,et al. TASO: optimizing deep learning computation with automatic generation of graph substitutions , 2019, SOSP.
[34] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Carole-Jean Wu,et al. Understanding Capacity-Driven Scale-Out Neural Recommendation Inference , 2020, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[36] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[37] Shuicheng Yan,et al. Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Yujeong Choi,et al. PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[39] Carole-Jean Wu,et al. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[40] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[41] Martin D. Schatz,et al. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[42] Jiyan Yang,et al. Post-Training 4-bit Quantization on Embedding Tables , 2019, ArXiv.
[43] Song Han,et al. SpArch: Efficient Architecture for Sparse Matrix Multiplication , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[44] Martin D. Schatz,et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.
[45] Du Tran,et al. What Makes Training Multi-Modal Classification Networks Hard? , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.
[47] Carole-Jean Wu,et al. Cross-Stack Workload Characterization of Deep Recommendation Systems , 2020, 2020 IEEE International Symposium on Workload Characterization (IISWC).
[48] Jaewon Lee,et al. MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[49] Minsoo Rhu,et al. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[50] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[51] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).