暂无分享,去创建一个
Alaa Maalouf | Dan Feldman | Daniela Rus | Harry Lang | D. Rus | Dan Feldman | Harry Lang | Alaa Maalouf
[1] Murad Tukan,et al. Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation , 2020, ArXiv.
[2] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[3] Rahul Goel,et al. Online Embedding Compression for Text Classification using Low Rank Matrix Factorization , 2018, AAAI.
[4] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[5] Murad Tukan,et al. Coresets for Near-Convex Functions , 2020, NeurIPS.
[6] Zhixun Su,et al. Fixed-rank representation for unsupervised visual learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[7] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[8] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[9] Ahmed Hassan Awadallah,et al. Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data , 2019 .
[10] Klemens Böhm,et al. One-Class Active Learning for Outlier Detection with Multiple Subspaces , 2019, CIKM.
[11] Jimmy J. Lin,et al. MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models , 2019, 1911.03588.
[12] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[13] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[14] J. Scott McCarley,et al. Structured Pruning of a BERT-based Question Answering Model , 2019 .
[15] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[16] Harry Shum,et al. Concurrent subspaces analysis , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[17] Yanzhi Wang,et al. Reweighted Proximal Pruning for Large-Scale Language Representation , 2019, ArXiv.
[18] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[19] Kasturi R. Varadarajan,et al. No Coreset, No Cry: II , 2005, FSTTCS.
[20] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[21] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[22] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2019, EMNLP.
[23] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[24] Pascal Frossard,et al. Dictionary Learning , 2011, IEEE Signal Processing Magazine.
[25] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.
[26] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[27] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[28] Xiaokang Yang,et al. Learning dictionary via subspace segmentation for sparse representation , 2011, 2011 18th IEEE International Conference on Image Processing.
[29] Dacheng Tao,et al. On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Guillermo Sapiro,et al. Supervised Dictionary Learning , 2008, NIPS.
[31] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[32] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[33] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[34] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[35] Jimmy J. Lin,et al. Attentive Student Meets Multi-Task Teacher: Improved Knowledge Distillation for Pretrained Models , 2019, ArXiv.
[36] Ahmed Hassan Awadallah,et al. Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data , 2019, arXiv.org.
[37] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[38] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[39] David P. Woodruff,et al. Input Sparsity and Hardness for Robust Subspace Approximation , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.
[40] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[41] Mitchell A. Gordon,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, REPL4NLP.
[42] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[43] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[44] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[45] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[46] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[47] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.