Efficient Neural Network Based Systems on Mobile and Cloud Platforms
暂无分享,去创建一个
[1] Anita Sellent,et al. Optical Flow Estimation versus Motion Estimation , 2012 .
[2] Yiran Chen,et al. An EDA framework for large scale hybrid neuromorphic computing systems , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[3] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[5] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[6] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[8] Kilian Q. Weinberger,et al. CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[9] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[10] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[11] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[12] Hai Li,et al. NeuralHMC: an efficient HMC-based accelerator for deep neural networks , 2019, ASP-DAC.
[13] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[14] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[15] Yiran Chen,et al. MobiEye: An Efficient Cloud-based Video Detection System for Real-Time Mobile Applications , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[16] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.
[17] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[18] Pushmeet Kohli,et al. PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions , 2015, NIPS.
[19] Roland Schweiger,et al. Robust Deep-Learning-Based Road-Prediction for Augmented Reality Navigation Systems at Night , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).
[20] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[21] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[22] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[23] Jonathan Tompson,et al. MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.
[24] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[25] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[26] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[28] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[29] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[30] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[31] Yanzhi Wang,et al. Reweighted Proximal Pruning for Large-Scale Language Representation , 2019, ArXiv.
[32] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[33] Robin Cheong. transformers . zip : Compressing Transformers with Pruning and Quantization , 2019 .
[34] Yiran Chen,et al. AdaLearner: An adaptive distributed mobile learning system for neural networks , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[35] Bin Yang,et al. SBNet: Sparse Blocks Network for Fast Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[36] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[37] Xuehai Qian,et al. HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[38] Fuchun Sun,et al. RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[41] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[42] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[43] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.
[44] Yiran Chen,et al. Differentiable Fine-grained Quantization for Deep Neural Network Compression , 2018, ArXiv.
[45] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..
[46] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Yiran Chen,et al. MoDNN: Local distributed mobile computing system for Deep Neural Network , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[48] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[49] Patrik O. Hoyer,et al. Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..
[50] Yang Li,et al. GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking , 2018, NeurIPS.
[51] Deyi Xiong,et al. Accelerating Neural Transformer via an Average Attention Network , 2018, ACL.
[52] Yichen Wei,et al. Towards High Performance Video Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[53] Xiaogang Wang,et al. DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.
[54] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[55] Wei Wen,et al. DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures , 2019, ICLR.
[56] Xing Wang,et al. Multi-Granularity Self-Attention for Neural Machine Translation , 2019, EMNLP.
[57] Feng Qian,et al. Understanding On-device Bufferbloat for Cellular Upload , 2016, Internet Measurement Conference.
[58] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[59] Tobias Domhan,et al. How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures , 2018, ACL.
[60] Kurt Keutzer,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2020, AAAI.
[61] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.
[62] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[63] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[64] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[65] Yiran Chen,et al. MORPh: Mobile OLED-friendly recording and playback system for low power video streaming , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[66] Hassan Foroosh,et al. Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[68] Matthew Mattina,et al. Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[69] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.
[70] Yiran Chen,et al. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[71] Yichen Wei,et al. Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[72] OSCAR ALSING,et al. Mobile Object Detection using TensorFlow Lite and Transfer Learning , 2018 .
[73] Tim Dettmers,et al. 8-Bit Approximations for Parallelism in Deep Learning , 2015, ICLR.
[74] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[75] Alexis M. Tourapis,et al. Fast motion estimation within the H.264 codec , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).
[76] Jiachen Mao,et al. DASNet: Dynamic Activation Sparsity for Neural Network Efficiency Improvement , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).
[77] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[78] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[79] Jack Xin,et al. A Method for Finding Structured Sparse Solutions to Nonnegative Least Squares Problems with Applications , 2013, SIAM J. Imaging Sci..
[80] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[81] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[82] Walter Scheirer,et al. Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation , 2019, EMNLP.
[83] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[84] Ondrej Bojar,et al. Training Tips for the Transformer Model , 2018, Prague Bull. Math. Linguistics.
[85] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.
[86] Fuchun Sun,et al. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[87] Mohit Thakkar. Custom Core ML Models Using Create ML , 2019 .
[88] Yiran Chen,et al. Running sparse and low-precision neural network: When algorithm meets hardware , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).
[89] Fang Liu,et al. Learning Intrinsic Sparse Structures within Long Short-term Memory , 2017, ICLR.
[90] Yiran Chen,et al. SPN Dash - Fast Detection of Adversarial Attacks on Mobile via Sensor Pattern Noise Fingerprinting , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[91] Daniel Camps-Mur,et al. Device-to-device communications with Wi-Fi Direct: overview and experimentation , 2013, IEEE Wireless Communications.
[92] Yiran Chen,et al. ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[93] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.
[94] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.
[95] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[96] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[97] David S. Johnson,et al. Some Simplified NP-Complete Graph Problems , 1976, Theor. Comput. Sci..
[98] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[99] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[100] Xiaoxiao Li,et al. Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[101] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[102] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[103] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2019, EMNLP.
[104] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[105] Ronald G. Dreslinski,et al. A hybrid approach to offloading mobile image classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[106] Aurélien Plyer,et al. Massively parallel Lucas Kanade optical flow for real-time video processing applications , 2014, Journal of Real-Time Image Processing.