暂无分享,去创建一个
Alham Fikri Aji | Radityo Eko Prasojo | Haryo Akbarianto Wibowo | Made Nindyatama Nityasya | Rendi Chevi | Rendi Chevi
[1] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[2] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Timothy Baldwin,et al. IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP , 2020, COLING.
[5] Graham Neubig,et al. When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? , 2018, NAACL.
[6] Tao Chen,et al. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN , 2017, Expert Syst. Appl..
[7] Ayu Purwarianti,et al. Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector , 2019, 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA).
[8] Ayu Purwarianti,et al. Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger , 2018, 2018 International Conference on Asian Language Processing (IALP).
[9] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[10] Georgios Tzimiropoulos,et al. Knowledge distillation via softmax regression representation learning , 2021, ICLR.
[11] Jimmy Lin,et al. Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT , 2020, RepL4NLP@ACL.
[12] Ruslan Salakhutdinov,et al. Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function , 2019, AAAI.
[13] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[14] Martín Abadi,et al. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.
[15] Alexander Lerchner,et al. A Heuristic for Unsupervised Model Selection for Variational Disentangled Representation Learning , 2019, ICLR.
[16] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[17] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.
[18] Alham Fikri Aji,et al. Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging , 2018, 2018 International Conference on Asian Language Processing (IALP).
[19] Phongtharin Vinayavekhin,et al. Unifying Heterogeneous Classifiers With Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Ruli Manurung,et al. Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus , 2014, 2014 International Conference on Asian Language Processing (IALP).
[21] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[22] Se-Young Yun,et al. Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation , 2021, IJCAI.
[23] Glenn M. Fung,et al. Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention , 2021, AAAI.
[24] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[25] Mirna Adriani,et al. Emotion Classification on Indonesian Twitter Dataset , 2018, 2018 International Conference on Asian Language Processing (IALP).
[26] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[27] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[30] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[32] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[33] George Karypis,et al. Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing , 2021, SUSTAINLP.
[34] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[35] Masayu Leylia Khodra,et al. Aspect and Opinion Terms Extraction Using Double Embeddings and Attention Mechanism for Indonesian Hotel Reviews , 2019, 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA).
[36] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[37] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[38] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.
[39] Ayu Purwarianti,et al. IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding , 2020, AACL.
[40] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[41] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[42] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[43] Jingren Zhou,et al. AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search , 2020, IJCAI.
[44] Richard Johansson,et al. Knowledge Distillation for Swedish NER models: A Search for Performance and Efficiency , 2021, NODALIDA.
[45] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[46] Yin Yang,et al. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT , 2020, Transactions of the Association for Computational Linguistics.
[47] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[48] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[49] Gu-Yeon Wei,et al. Benchmarking TPU, GPU, and CPU Platforms for Deep Learning , 2019, ArXiv.
[50] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[51] Mitchell P. Marcus,et al. Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.
[52] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.