暂无分享,去创建一个
Katikapalli Subramanyam Kalyan | Sivanesan Sangeetha | Ajit Rajasekharan | A. Rajasekharan | S. Sangeetha
[1] Aswin Sivaraman,et al. Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement , 2020, ArXiv.
[2] Liang Xu,et al. CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model , 2020, ArXiv.
[3] Colin Raffel,et al. ByT5: Towards a token-free future with pre-trained byte-to-byte models , 2021, ArXiv.
[4] Benoît Sagot,et al. Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures , 2019 .
[5] Hongxia Yang,et al. OAG-BERT: Pre-train Heterogeneous Entity-augmented Academic Language Models , 2021, ArXiv.
[6] D. Tao,et al. A Survey on Visual Transformer , 2020, ArXiv.
[7] Kevin Duh,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, RepL4NLP@ACL.
[8] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[9] Marcel Salathé,et al. COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter , 2020, Frontiers in Artificial Intelligence.
[10] Kaisheng M. Wang,et al. PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation , 2021, ArXiv.
[11] Gaurav Menghani,et al. Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better , 2021, ACM Comput. Surv..
[12] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[13] Tommaso Caselli,et al. HateBERT: Retraining BERT for Abusive Language Detection in English , 2020, WOAH.
[14] Peng Zhou,et al. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling , 2016, COLING.
[15] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[16] Furu Wei,et al. Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains , 2021, FINDINGS.
[17] Jungo Kasai,et al. GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation , 2021, ArXiv.
[18] Matt J. Kusner,et al. A Survey on Contextual Embeddings , 2020, ArXiv.
[19] Y. Matsumura,et al. A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT , 2020 .
[20] Julian J. McAuley,et al. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.
[21] Xavier Amatriain,et al. Domain-Relevant Embeddings for Medical Question Similarity , 2019, ArXiv.
[22] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[23] Yoav Shoham,et al. SenseBERT: Driving Some Sense into BERT , 2019, ACL.
[24] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[25] Dat Quoc Nguyen,et al. PhoBERT: Pre-trained language models for Vietnamese , 2020, FINDINGS.
[26] Kyle Lo,et al. S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.
[27] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[28] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.
[29] Piotr Rybak,et al. KLEJ: Comprehensive Benchmark for Polish Language Understanding , 2020, ACL.
[30] Olatunji Ruwase,et al. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.
[31] Xuanjing Huang,et al. Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.
[32] Marius Mosbach,et al. On the Interplay Between Fine-tuning and Sentence-Level Probing for Linguistic Knowledge in Pre-Trained Transformers , 2020, BLACKBOXNLP.
[33] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[34] Iz Beltagy,et al. SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.
[35] Graham Neubig,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.
[36] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[37] Tianyu Gao,et al. SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.
[38] Leonardo Neves,et al. TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification , 2020, FINDINGS.
[39] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[40] Hyunjae Lee,et al. KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).
[41] Gabriel Synnaeve,et al. CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings , 2021, NeurIPS.
[42] Roland Vollgraf,et al. Contextual String Embeddings for Sequence Labeling , 2018, COLING.
[43] Sarana Nutanong,et al. WangchanBERTa: Pretraining transformer-based Thai Language Models , 2021, ArXiv.
[44] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..
[45] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[46] Mohammad Manthouri,et al. ParsBERT: Transformer-based Model for Persian Language Understanding , 2020, Neural Processing Letters.
[47] Anjali Agrawal,et al. Shuffled-token Detection for Refining Pre-trained RoBERTa , 2021, NAACL.
[48] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[49] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[50] Manish Gupta,et al. Compression of Deep Learning Models for Text: A Survey , 2022, ACM Trans. Knowl. Discov. Data.
[51] Nigel Collier,et al. Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders , 2021, EMNLP.
[52] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.
[53] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[54] Jennifer J. Liang,et al. Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning , 2020, JMIR medical informatics.
[55] Michal Perelkiewicz,et al. Pre-training Polish Transformer-based Language Models at Scale , 2020, ICAISC.
[56] Deniz Yuret,et al. KU_ai at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI , 2019, BioNLP@ACL.
[57] Bridget T. McInnes,et al. MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning , 2020, J. Am. Medical Informatics Assoc..
[58] Diyi Yang,et al. The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics , 2021, GEM.
[59] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[60] Dat Quoc Nguyen,et al. BERTweet: A pre-trained language model for English Tweets , 2020, EMNLP.
[61] Dilek Z. Hakkani-Tür,et al. DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue , 2020, ArXiv.
[62] Philipp Dufter,et al. Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models , 2021, EACL.
[63] Iryna Gurevych,et al. What to Pre-Train on? Efficient Intermediate Task Selection , 2021, EMNLP.
[64] Jaewoo Kang,et al. Transferability of Natural Language Inference to Biomedical Question Answering , 2020, CLEF.
[65] Tapio Salakoski,et al. Multilingual is not enough: BERT for Finnish , 2019, ArXiv.
[66] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[67] Alex Wang,et al. Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling , 2018, ACL.
[68] Zhuosheng Zhang,et al. LIMIT-BERT : Linguistic Informed Multi-Task BERT , 2020, EMNLP.
[69] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[70] Hieu Tran,et al. CoTexT: Multi-task Learning with Code-Text Transformer , 2021, NLP4PROG.
[71] Marco Basaldella,et al. Self-alignment Pre-training for Biomedical Entity Representations , 2020, ArXiv.
[72] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[73] Yang Zhang,et al. Bio-Megatron: Larger Biomedical Domain Language Model , 2020, EMNLP.
[74] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[75] Furu Wei,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[76] Hiroyuki Shindo,et al. Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation , 2016, CoNLL.
[77] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[78] Roberto de Alencar Lotufo,et al. BERTimbau: Pretrained BERT Models for Brazilian Portuguese , 2020, BRACIS.
[79] Hinrich Schütze,et al. Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking , 2019, AAAI.
[80] Kyle Lo,et al. FLEX: Unifying Evaluation for Few-Shot NLP , 2021, NeurIPS.
[81] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2021, NAACL.
[82] Helen Chen,et al. UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus , 2021, NAACL.
[83] Xipeng Qiu,et al. Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.
[84] Martin Malmsten,et al. Playing with Words at the National Library of Sweden - Making a Swedish BERT , 2020, ArXiv.
[85] Neel Sundaresan,et al. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.
[86] Dian Yu,et al. CLUE: A Chinese Language Understanding Evaluation Benchmark , 2020, COLING.
[87] Hazem Hajj,et al. AraBERT: Transformer-based Model for Arabic Language Understanding , 2020, OSACT.
[88] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.
[89] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[90] Anna Korhonen,et al. Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity , 2019, COLING.
[91] Kazem Rahimi,et al. BEHRT: Transformer for Electronic Health Records , 2019, Scientific Reports.
[92] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[93] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[94] Shuicheng Yan,et al. ConvBERT: Improving BERT with Span-based Dynamic Convolution , 2020, NeurIPS.
[95] Qingyu Chen,et al. An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining , 2020, BIONLP.
[96] Vivek Srikumar,et al. A Closer Look at How Fine-tuning Changes BERT , 2021, ArXiv.
[97] Xiaodong Liu,et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..
[98] Li Dong,et al. Cross-Lingual Natural Language Generation via Pre-Training , 2020, AAAI.
[99] Guoao Wei,et al. FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark , 2021, ArXiv.
[100] Tommaso Caselli,et al. BERTje: A Dutch BERT Model , 2019, ArXiv.
[101] Qun Liu,et al. TernaryBERT: Distillation-aware Ultra-low Bit BERT , 2020, EMNLP.
[102] Roy Schwartz,et al. Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.
[103] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[104] Ting Liu,et al. CausalBERT: Injecting Causal Knowledge Into Pre-trained Models with Minimal Supervision , 2021, ArXiv.
[105] Alena Fenogenova,et al. RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark , 2020, EMNLP.
[106] Ke Xu,et al. Investigating Learning Dynamics of BERT Fine-Tuning , 2020, AACL.
[107] Olivier Bodenreider,et al. The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..
[108] Yi Yang,et al. FinBERT: A Pretrained Language Model for Financial Communications , 2020, ArXiv.
[109] Antoine. NetBERT: A Pre-trained Language Representation Model for Computer Networking , 2020 .
[110] Preslav Nakov,et al. Poor Man's BERT: Smaller and Faster Transformer Models , 2020, ArXiv.
[111] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[112] Marcin Junczys-Dowmunt,et al. The United Nations Parallel Corpus v1.0 , 2016, LREC.
[113] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[114] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[115] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[116] Amaru Cuba Gyllensten,et al. Semantic Re-tuning with Contrastive Tension , 2021, ICLR.
[117] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.
[118] Evangelos Kanoulas,et al. A Benchmark for Lease Contract Review , 2020, ArXiv.
[119] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[120] Noah A. Smith,et al. Variational Pretraining for Semi-supervised Text Classification , 2019, ACL.
[121] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[122] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[123] Myle Ott,et al. Larger-Scale Transformers for Multilingual Masked Language Modeling , 2021, REPL4NLP.
[124] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[125] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[126] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[127] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[128] Pengtao Xie,et al. CERT: Contrastive Self-supervised Learning for Language Understanding , 2020, ArXiv.
[129] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[130] Zhengyun Zhao,et al. CODER: Knowledge-infused cross-lingual medical term embedding for term normalization , 2020, J. Biomed. Informatics.
[131] Benjamin Lecouteux,et al. FlauBERT: Unsupervised Language Model Pre-training for French , 2020, LREC.
[132] Ayu Purwarianti,et al. IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding , 2020, AACL.
[133] Hinrich Schutze,et al. Negated LAMA: Birds cannot fly , 2019, ArXiv.
[134] Atreyee Dey,et al. MuRIL: Multilingual Representations for Indian Languages , 2021, ArXiv.
[135] Anindya Iqbal,et al. BanglaBERT: Combating Embedding Barrier for Low-Resource Language Understanding , 2021, ArXiv.
[136] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[137] Giovanni Semeraro,et al. AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets , 2019, CLiC-it.
[138] Jinlan Fu,et al. XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation , 2021, EMNLP.
[139] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.
[140] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[141] Erik T. Mueller,et al. Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.
[142] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[143] Fan Yang,et al. XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.
[144] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[145] Guoming Zhang,et al. Cloud-based intelligent self-diagnosis and department recommendation service using Chinese medical BERT , 2021, J. Cloud Comput..
[146] Holger Schwenk,et al. WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.
[147] Ahmadreza Mosallanezhad,et al. ParsiNLU: A Suite of Language Understanding Challenges for Persian , 2020, Transactions of the Association for Computational Linguistics.
[148] Lutfi Kerem Senel,et al. Does She Wink or Does She Nod? A Challenging Benchmark for Evaluating Word Understanding of Language Models , 2021, EACL.
[149] Ziqian Xie,et al. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction , 2020, npj Digital Medicine.
[150] Ming Zhou,et al. GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.
[151] Kevin Duh,et al. Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System? , 2019, TACL.
[152] Ulli Waltinger,et al. Inexpensive Domain Adaptation of Pretrained Language Models: Case Studies on Biomedical NER and Covid-19 QA , 2020, EMNLP.
[153] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[154] Furu Wei,et al. mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs , 2021, EMNLP.
[155] Graham Neubig,et al. X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models , 2020, EMNLP.
[156] Ruslan Salakhutdinov,et al. Towards Understanding and Mitigating Social Biases in Language Models , 2021, ICML.
[157] Xingyi Cheng,et al. Dual-View Distilled BERT for Sentence Embedding , 2021, SIGIR.
[158] Hazem Hajj,et al. AraGPT2: Pre-Trained Transformer for Arabic Language Generation , 2021, WANLP.
[159] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[160] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[161] Xiaocheng Feng,et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, EMNLP.
[162] Hai-Tao Zheng,et al. CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding , 2021, ACL.
[163] Thamar Solorio,et al. LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation , 2020, LREC.
[164] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[165] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[166] Timothy Baldwin,et al. IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP , 2020, COLING.
[167] Fuzheng Zhang,et al. ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer , 2021, ACL.
[168] Pushpak Bhattacharyya,et al. The IIT Bombay English-Hindi Parallel Corpus , 2017, LREC.
[169] Graham Neubig,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.
[170] Qianghuai Jia,et al. Conceptualized Representation Learning for Chinese Biomedical Text Mining , 2020, ArXiv.
[171] Muhammad Abdul-Mageed,et al. ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic , 2020, ACL.
[172] Barry Haddow,et al. PMIndia - A Collection of Parallel Corpora of Languages of India , 2020, ArXiv.
[173] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[174] Nadir Durrani,et al. Analyzing Redundancy in Pretrained Transformer Models , 2020, EMNLP.
[175] Elahe Rahimtoroghi,et al. What Happens To BERT Embeddings During Fine-tuning? , 2020, BLACKBOXNLP.
[176] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[177] Yin Yang,et al. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT , 2020, Transactions of the Association for Computational Linguistics.
[178] Weizhe Yuan,et al. BARTScore: Evaluating Generated Text as Text Generation , 2021, NeurIPS.
[179] Luo Si,et al. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding , 2019, ICLR.
[180] Wei-Hung Weng,et al. Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.
[181] Ion Androutsopoulos,et al. LEGAL-BERT: “Preparing the Muppets for Court’” , 2020, FINDINGS.
[182] Jiancheng Lv,et al. GLGE: A New General Language Generation Evaluation Benchmark , 2021, FINDINGS.
[183] Philip S. Yu,et al. Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT , 2020, ArXiv.
[184] Yen-Pin Chen,et al. Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation , 2020, JMIR medical informatics.
[185] Dan Roth,et al. Extending Multilingual BERT to Low-Resource Languages , 2020, FINDINGS.
[186] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2021, NAACL.
[187] Jie Zhou,et al. SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis , 2020, COLING.
[188] Dogu Araci,et al. FinBERT: Financial Sentiment Analysis with Pre-trained Language Models , 2019, ArXiv.
[189] Ming Zhou,et al. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.
[190] Taranjit Kaur,et al. Automated Brain Image Classification Based on VGG-16 and Transfer Learning , 2019, 2019 International Conference on Information Technology (ICIT).
[191] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[192] Mitesh M. Khapra,et al. iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages , 2020, FINDINGS.
[193] Aline Villavicencio,et al. The brWaC Corpus: A New Open Resource for Brazilian Portuguese , 2018, LREC.
[194] Jie Tang,et al. Self-Supervised Learning: Generative or Contrastive , 2020, IEEE Transactions on Knowledge and Data Engineering.
[195] Christo Kirov,et al. Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset , 2020, LREC.
[196] William Speier,et al. Bidirectional Representation Learning From Transformers Using Multimodal Electronic Health Record Data to Predict Depression , 2021, IEEE Journal of Biomedical and Health Informatics.
[197] Timothy Baldwin,et al. Learning from Unlabelled Data for Clinical Semantic Textual Similarity , 2020, CLINICALNLP.
[198] Wanxiang Che,et al. TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing , 2020, ACL.
[199] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[200] Iryna Gurevych,et al. How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models , 2021, ACL/IJCNLP.
[201] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[202] Beliz Gunel,et al. Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning , 2020, ICLR.
[203] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[204] Samuel R. Bowman,et al. Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? , 2020, ACL.
[205] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[206] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[207] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[208] Mingxuan Wang,et al. LightSeq: A High Performance Inference Library for Transformers , 2021, NAACL.
[209] Sebastian Gehrmann,et al. exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models , 2019, ArXiv.
[210] John Wieting,et al. CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation , 2021, ArXiv.
[211] Hiroaki Hayashi,et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..
[212] Zhengxiao Du,et al. GPT Understands, Too , 2021, AI Open.
[213] Rich Caruana,et al. Model compression , 2006, KDD '06.
[214] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.
[215] Henghui Zhu,et al. Enhancing Clinical BERT Embedding using a Biomedical Knowledge Base , 2020, COLING.
[216] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[217] Jinlan Fu,et al. ExplainaBoard: An Explainable Leaderboard for NLP , 2021, ACL.
[218] Vedant Misra,et al. BLACK BOX ATTACKS ON TRANSFORMER LANGUAGE MODELS , 2019 .
[219] Pierre Zweigenbaum,et al. CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters , 2020, COLING.
[220] Het Shah,et al. KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization , 2020, ArXiv.
[221] Furu Wei,et al. XLM-E: Cross-lingual Language Model Pre-training via ELECTRA , 2021, ArXiv.
[222] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[223] Peter Szolovits,et al. MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.
[224] Hazem Hajj,et al. AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding , 2020, ArXiv.
[225] Morteza Ziyadi,et al. MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers , 2020, ArXiv.
[226] Zhiyong Lu,et al. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.
[227] Luis Espinosa Anke,et al. XLM-T: A Multilingual Language Model Toolkit for Twitter , 2021, ArXiv.
[228] Xipeng Qiu,et al. A Survey of Transformers , 2021, AI Open.
[229] Sha Yuan,et al. WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models , 2021, AI Open.
[230] Ting Liu,et al. CharBERT: Character-aware Pre-trained Language Model , 2020, COLING.
[231] Zhiyuan Liu,et al. CPM-2: Large-scale Cost-effective Pre-trained Language Models , 2021, AI Open.
[232] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[233] Kyunghyun Cho,et al. Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.
[234] Zhen Qin,et al. Charformer: Fast Character Transformers via Gradient-based Subword Tokenization , 2021, ArXiv.
[235] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[236] Jugal Kalita,et al. Multi-task learning for natural language processing in the 2020s: where are we going? , 2020, Pattern Recognit. Lett..
[237] Shuangzhi Wu,et al. Alternating Language Modeling for Cross-Lingual Pre-Training , 2020, AAAI.
[238] Yonghui Wu,et al. Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models , 2020, JMIR medical informatics.
[239] Edouard Grave,et al. Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.
[240] Christopher M. Danforth,et al. Interpretable Bias Mitigation for Textual Data: Reducing Genderization in Patient Notes While Maintaining Classification Performance , 2021, ACM Trans. Comput. Heal..
[241] Sampo Pyysalo,et al. The birth of Romanian BERT , 2020, FINDINGS.
[242] Mikhail Arkhipov,et al. Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language , 2019, ArXiv.
[243] Monojit Choudhury,et al. GLUECoS: An Evaluation Benchmark for Code-Switched NLP , 2020, ACL.
[244] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[245] Alexey Sorokin,et al. Tuning Multilingual Transformers for Language-Specific Named Entity Recognition , 2019, BSNLP@ACL.
[246] Kwan Hui Lim,et al. An Unsupervised Sentence Embedding Method by Mutual Information Maximization , 2020, EMNLP.
[247] Zhi Tang,et al. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding , 2021, ArXiv.
[248] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL/IJCNLP.
[249] Dacheng Tao,et al. A Survey on Multi-view Learning , 2013, ArXiv.
[250] Zhiyuan Liu,et al. Knowledge Inheritance for Pre-trained Language Models , 2021, ArXiv.
[251] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[252] Diedre Carmo,et al. PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data , 2020, ArXiv.
[253] Anoop Kunchukuttan,et al. Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages , 2021, ArXiv.
[254] El Moatez Billah Nagoudi,et al. IndT5: A Text-to-Text Transformer for 10 Indigenous Languages , 2021, AMERICASNLP.
[255] Hinrich Schutze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.
[256] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[257] Kai-Wei Chang,et al. Unified Pre-training for Program Understanding and Generation , 2021, NAACL.
[258] Alice H. Oh,et al. KLUE: Korean Language Understanding Evaluation , 2021, NeurIPS Datasets and Benchmarks.
[259] Bill Yuchen Lin,et al. Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning , 2021, ACL.
[260] Vishrav Chaudhary,et al. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.
[261] Marius Mosbach,et al. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ArXiv.
[262] Jianmo Ni,et al. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.
[263] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[264] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[265] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[266] Sameer Singh,et al. Eliciting Knowledge from Language Models Using Automatically Generated Prompts , 2020, EMNLP.
[267] Philipp Koehn,et al. A Massive Collection of Cross-Lingual Web-Document Pairs , 2019, EMNLP.
[268] Yao Zhao,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.
[269] Bettina Berendt,et al. RobBERT: a Dutch RoBERTa-based Language Model , 2020, FINDINGS.
[270] Davis Liang,et al. Masked Language Model Scoring , 2019, ACL.
[271] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.
[272] Milan Straka,et al. RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model , 2021, TDS.
[273] Nan Duan,et al. FastSeq: Make Sequence Generation Faster , 2021, ACL.
[274] Andreas Moshovos,et al. GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[275] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[276] Ulli Waltinger,et al. BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA , 2019, ArXiv.
[277] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[278] Jaewoo Kang,et al. Pre-trained Language Model for Biomedical Question Answering , 2019, PKDD/ECML Workshops.
[279] Iryna Gurevych,et al. TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning , 2021, EMNLP.
[280] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[281] Catherine Havasi,et al. Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.
[282] Richard Socher,et al. TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue , 2020, EMNLP.
[283] Alessandro Moschitti,et al. Efficient pre-training objectives for Transformers , 2021, ArXiv.
[284] Jianfeng Gao,et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.
[285] Yang Yu,et al. TurboTransformers: an efficient GPU serving system for transformer models , 2020, PPoPP.
[286] Christophe Gravier,et al. T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples , 2018, LREC.
[287] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[288] Fahad Shahbaz Khan,et al. Transformers in Vision: A Survey , 2021, ACM Comput. Surv..
[289] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[290] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.
[291] Laurent Romary,et al. CamemBERT: a Tasty French Language Model , 2019, ACL.
[292] Minlie Huang,et al. SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge , 2020, EMNLP.
[293] Masayu Leylia Khodra,et al. IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation , 2021, EMNLP.
[294] Osamu Abe,et al. KART: Privacy Leakage Framework of Language Models Pre-trained with Clinical Records , 2021, ArXiv.
[295] Bhuwan Dhingra,et al. Combating Adversarial Misspellings with Robust Word Recognition , 2019, ACL.
[296] Douglas Eck,et al. Deduplicating Training Data Makes Language Models Better , 2021, ArXiv.
[297] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[298] H. T. Kung,et al. exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources , 2020, FINDINGS.