暂无分享,去创建一个
Zhiyuan Liu | Zhiwu Lu | Ji-Rong Wen | Ruihua Song | Jun Zhu | Yanyan Lan | Xipeng Qiu | Wayne Xin Zhao | Xu Han | Yuqi Huo | Jiezhong Qiu | Minlie Huang | Yang Liu | Xiao Liu | Jie Tang | Ning Ding | Zhengyan Zhang | Yuxian Gu | Liang Zhang | Wentao Han | Qin Jin | Jinhui Yuan | Jie Tang | Zhiyuan Liu | Jun Zhu | Jinhui Yuan | Yanyan Lan | Xipeng Qiu | Yang Liu | Minlie Huang | Ruihua Song | J. Qiu | Zhiwu Lu | Ji-rong Wen | Zhengyan Zhang | Xu Han | Yuqi Huo | Ning Ding | Liang Zhang | Xiao Liu | Qin Jin | Yuxian Gu | Wentao Han
[1] Maosong Sun,et al. Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification , 2021, ACL.
[2] Xipeng Qiu,et al. A Survey of Transformers , 2021, AI Open.
[3] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[4] Hai-Tao Zheng,et al. CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding , 2021, ACL.
[5] Chang Zhou,et al. CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.
[6] Zhiyuan Liu,et al. PTR: Prompt Tuning with Rules for Text Classification , 2021, AI Open.
[7] Xipeng Qiu,et al. Token-Aware Virtual Adversarial Training in Natural Language Understanding , 2021, AAAI.
[8] Kaisheng M. Wang,et al. PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation , 2021, ArXiv.
[9] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[10] Olatunji Ruwase,et al. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep learning , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Zhilin Yang,et al. FastMoE: A Fast Mixture-of-Expert Training System , 2021, ArXiv.
[12] Hai-Tao Zheng,et al. Prototypical Representation Learning for Relation Extraction , 2021, ICLR.
[13] Zhilin Yang,et al. Controllable Generation from Pre-trained Language Models via Inverse Prompting , 2021, KDD.
[14] Zhengxiao Du,et al. GPT Understands, Too , 2021, AI Open.
[15] Zhiwu Lu,et al. WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training , 2021, ArXiv.
[16] Roy Schwartz,et al. Random Feature Attention , 2021, ICLR.
[17] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[18] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[19] Hao Zhang,et al. TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models , 2021, ICML.
[20] William W. Cohen,et al. Reasoning Over Virtual Knowledge Bases With Open Predicate Relations , 2021, ICML.
[21] Zhiyuan Liu,et al. Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks , 2021, Machine Intelligence Research.
[22] Olatunji Ruwase,et al. ZeRO-Offload: Democratizing Billion-Scale Model Training , 2021, USENIX ATC.
[23] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[24] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[25] Hua Wu,et al. ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora , 2020, EMNLP.
[26] Zhiyuan Liu,et al. ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning , 2020, ACL.
[27] Maosong Sun,et al. Towards a Universal Continuous Knowledge Base , 2020, AI Open.
[28] Huanqi Cao,et al. CPM: A Large-scale Generative Chinese Pre-trained Language Model , 2020, AI Open.
[29] Xinlei Chen,et al. Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Qun Liu,et al. Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads , 2020, AI Open.
[31] Rethinking Attention with Performers , 2020, ICLR.
[32] Ken-ichi Kawarabayashi,et al. How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks , 2020, ICLR.
[33] Hinrich Schütze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.
[34] Goran Glavas,et al. Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation , 2020, EACL.
[35] Weihua Luo,et al. On Learning Universal Representations Across Languages , 2020, ICLR.
[36] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2020, NAACL.
[37] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[38] Hua Wu,et al. PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning , 2020, FINDINGS.
[39] Jianfeng Gao,et al. M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Mary Williamson,et al. Recipes for Building an Open-Domain Chatbot , 2020, EACL.
[41] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2020, TACL.
[42] Nan Duan,et al. XGPT: Cross-modal Generative Pre-Training for Image Captioning , 2020, NLPCC.
[43] Xuanjing Huang,et al. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.
[44] Zhiyuan Liu,et al. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, Transactions of the Association for Computational Linguistics.
[45] Zhiyuan Liu,et al. Adversarial Language Games for Advanced Natural Language Intelligence , 2019, AAAI.
[46] Wanxiang Che,et al. Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.
[47] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[48] Zhilin Yang,et al. All NLP Tasks Are Generation Tasks: A General Pretraining Framework , 2021, ArXiv.
[49] Maosong Sun,et al. Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning , 2020, ArXiv.
[50] Xuanjing Huang,et al. Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces , 2020, ArXiv.
[51] Minjia Zhang,et al. Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping , 2020, NeurIPS.
[52] Bin Dai,et al. Further Analysis of Outlier Detection with Deep Generative Models , 2020, NeurIPS.
[53] Dawn Song,et al. Language Models are Open Knowledge Graphs , 2020, ArXiv.
[54] Wenhu Chen,et al. KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation , 2020, EMNLP.
[55] Zheng Zhang,et al. CoLAKE: Contextualized Language and Knowledge Embedding , 2020, COLING.
[56] Qun Liu,et al. TernaryBERT: Distillation-aware Ultra-low Bit BERT , 2020, EMNLP.
[57] Olatunji Ruwase,et al. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.
[58] Tie-Yan Liu,et al. Variance-reduced Language Pretraining via a Mask Proposal Network , 2020, ArXiv.
[59] Minlie Huang,et al. A Large-Scale Chinese Short-Text Conversation Dataset , 2020, NLPCC.
[60] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[61] Samuel R. Bowman,et al. Can neural networks acquire a structural bias from raw linguistic data? , 2020, CogSci.
[62] William W. Cohen,et al. Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge , 2020, ArXiv.
[63] Felice Dell'Orletta,et al. Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation , 2020, REPL4NLP.
[64] Paul N. Bennett,et al. Knowledge-Aware Language Model Pretraining , 2020, ArXiv.
[65] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[66] Jie Tang,et al. Self-Supervised Learning: Generative or Contrastive , 2020, IEEE Transactions on Knowledge and Data Engineering.
[67] Quoc V. Le,et al. Rethinking Pre-training and Self-training , 2020, NeurIPS.
[68] Omer Levy,et al. Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.
[69] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[70] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.
[71] Anna Rumshisky,et al. When BERT Plays the Lottery, All Tickets Are Winning , 2020, EMNLP.
[72] Qun Liu,et al. Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT , 2020, ACL.
[73] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[74] Zhiyuan Liu,et al. Train No Evil: Selective Masking for Task-guided Pre-training , 2020, EMNLP.
[75] Tie-Yan Liu,et al. MPNet: Masked and Permuted Pre-training for Language Understanding , 2020, NeurIPS.
[76] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[77] Eunsol Choi,et al. Entities as Experts: Sparse Memory Access with Entity Supervision , 2020, EMNLP.
[78] Chenliang Li,et al. PALM: Pre-training an Autoencoding&autoregressive Language Model for Context-conditioned Generation , 2020, EMNLP.
[79] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[80] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[81] Shuangzhi Wu,et al. Alternating Language Modeling for Cross-Lingual Pre-Training , 2020, AAAI.
[82] Xipeng Qiu,et al. BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.
[83] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[84] Xipeng Qiu,et al. Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.
[85] Gu Jin,et al. SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping , 2020, ASPLOS.
[86] Li Dong,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[87] Graham Neubig,et al. Differentiable Reasoning over a Virtual Knowledge Base , 2020, ICLR.
[88] Mitchell A. Gordon,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, REPL4NLP.
[89] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[90] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.
[91] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[92] Regina Barzilay,et al. Blank Language Models , 2020, EMNLP.
[93] David Vilares,et al. Parsing as Pretraining , 2020, AAAI.
[94] Jihun Choi,et al. Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction , 2020, ICLR.
[95] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.
[96] Wei-Tsung Kao,et al. Further Boosting BERT-based Models by Duplicating Existing Layers: Some Intriguing Phenomena inside BERT , 2020, ArXiv.
[97] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[98] Lin Su,et al. ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data , 2020, ArXiv.
[99] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.
[100] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[101] Minlie Huang,et al. A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation , 2020, TACL.
[102] Wenhan Xiong,et al. Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model , 2019, ICLR.
[103] Peter J. Liu,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.
[104] Marcus Rohrbach,et al. 12-in-1: Multi-Task Vision and Language Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[105] Steven Schockaert,et al. Inducing Relational Knowledge from BERT , 2019, AAAI.
[106] Frank F. Xu,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.
[107] Ross B. Girshick,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[108] Hinrich Schütze,et al. E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT , 2019, FINDINGS.
[109] Minlie Huang,et al. SentiLARE: Linguistic Knowledge Enhanced Language Representation for Sentiment Analysis , 2019, EMNLP.
[110] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[111] J. Weston,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.
[112] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[113] Meng Zhang,et al. Textual Adversarial Attack as Combinatorial Optimization , 2019, 1910.12196.
[114] Lin-Shan Lee,et al. SpeechBERT: An Audio-and-Text Jointly Learned Language Model for End-to-End Spoken Question Answering , 2019, INTERSPEECH.
[115] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[116] Lei Yu,et al. A Mutual Information Maximization Perspective of Language Representation Learning , 2019, ICLR.
[117] Hua Wu,et al. PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable , 2019, ACL.
[118] Samyam Rajbhandari,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[119] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[120] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[121] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[122] Jason J. Corso,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2019, AAAI.
[123] Li Dong,et al. Cross-Lingual Natural Language Generation via Pre-Training , 2019, AAAI.
[124] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[125] Zhe Zhao,et al. K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.
[126] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[127] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[128] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[129] Allyson Ettinger,et al. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.
[130] Hao Tian,et al. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.
[131] Joey Tianyi Zhou,et al. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2019, AAAI.
[132] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[133] Rémi Gribonval,et al. And the Bit Goes Down: Revisiting the Quantization of Neural Networks , 2019, ICLR.
[134] Ning Chen,et al. Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.
[135] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[136] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..
[137] Yibo Zhu,et al. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters , 2020, OSDI.
[138] Chang Zhou,et al. CogLTX: Applying BERT to Long Texts , 2020, NeurIPS.
[139] Vinh Phu Nguyen,et al. Merlin: A GPU Accelerated Recommendation Framework , 2020 .
[140] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[141] Shikha Bordia,et al. Do Attention Heads in BERT Track Syntactic Dependencies? , 2019, ArXiv.
[142] M. de Rijke,et al. Understanding Multi-Head Attention in Abstractive Summarization , 2019, ArXiv.
[143] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[144] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[145] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[146] Jungo Kasai,et al. Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations , 2019, EMNLP.
[147] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[148] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[149] Sameer Singh,et al. Do NLP Models Know Numbers? Probing Numeracy in Embeddings , 2019, EMNLP.
[150] Noah A. Smith,et al. Knowledge Enhanced Contextual Word Representations , 2019, EMNLP.
[151] Ming Zhou,et al. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.
[152] Kawin Ethayarajh,et al. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.
[153] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[154] Alexander M. Rush,et al. Commonsense Knowledge Mining from Pretrained Models , 2019, EMNLP.
[155] Xiaozhe Ren,et al. NEZHA: Neural Contextualized Representation for Chinese Language Understanding , 2019, ArXiv.
[156] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[157] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[158] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[159] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.
[160] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[161] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[162] Tal Linzen,et al. Quantity doesn’t buy quality syntax with neural language models , 2019, EMNLP.
[163] David Reitter,et al. Fusion of Detected Objects in Text for Visual Question Answering , 2019, EMNLP.
[164] Yejin Choi,et al. Do Neural Language Representations Learn Physical Commonsense? , 2019, CogSci.
[165] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.
[166] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[167] Rui Li,et al. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs , 2019, KDD.
[168] Hung-Yu Kao,et al. Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.
[169] Guillaume Lample,et al. Large Memory Layers with Product Keys , 2019, NeurIPS.
[170] Rudolf Rosa,et al. Inducing Syntactic Trees from BERT Representations , 2019, ArXiv.
[171] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[172] Yejin Choi,et al. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.
[173] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[174] Jeffrey Ling,et al. Matching the Blanks: Distributional Similarity for Relation Learning , 2019, ACL.
[175] Robert Frank,et al. Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.
[176] Eva Schlinger,et al. How Multilingual is Multilingual BERT? , 2019, ACL.
[177] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.
[178] Di He,et al. Efficient Training of BERT by Progressively Stacking , 2019, ICML.
[179] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[180] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.
[181] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[182] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.
[183] Chang Zhou,et al. Cognitive Graph for Multi-Hop Reading Comprehension at Scale , 2019, ACL.
[184] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[185] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[186] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[187] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[188] Yu Sun,et al. ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.
[189] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[190] Iz Beltagy,et al. SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.
[191] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[192] Mikhail Khodak,et al. A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.
[193] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[194] Zheng Zhang,et al. Star-Transformer , 2019, NAACL.
[195] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[196] Yoav Goldberg,et al. Assessing BERT's Syntactic Abilities , 2019, ArXiv.
[197] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[198] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[199] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[200] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[201] Yee Whye Teh,et al. Set Transformer , 2018, ICML.
[202] Eric Wallace,et al. Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering , 2018, TACL.
[203] Minjie Wang,et al. Supporting Very Large Models using Automatic Dataflow Graph Partitioning , 2018, EuroSys.
[204] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[205] Sangeetha Abdu Jyothi,et al. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling , 2018, MLSys.
[206] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..
[207] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[208] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[209] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[210] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.
[211] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[212] Guillaume Lample,et al. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.
[213] Aleksander Madry,et al. Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.
[214] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .
[215] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[216] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.
[217] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[218] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[219] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[220] Max Welling,et al. Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.
[221] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[222] Jun Zhu,et al. ZhuSuan: A Library for Bayesian Deep Learning , 2017, ArXiv.
[223] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[224] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[225] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.
[226] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[227] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[228] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[229] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[230] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[231] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[232] Yonatan Belinkov,et al. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.
[233] Xing Shi,et al. Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.
[234] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[235] Allyson Ettinger,et al. Probing for semantic evidence of composition by means of simple classification tasks , 2016, RepEval@ACL.
[236] Ido Dagan,et al. context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.
[237] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[238] Xuanjing Huang,et al. Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.
[239] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[240] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[241] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[242] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[243] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[244] Xipeng Qiu,et al. Recurrent Neural Network for Text Classification with MultiTask Learning , 2016 .
[245] Arne Köhn,et al. What’s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation , 2015, EMNLP.
[246] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[247] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.
[248] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[249] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[250] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[251] Nikos Komodakis,et al. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[252] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[253] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[254] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[255] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[256] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[257] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[258] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[259] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[260] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[261] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.
[262] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[263] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[264] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[265] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[266] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[267] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[268] Philipp Koehn,et al. Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.
[269] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[270] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.
[271] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[272] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[273] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[274] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.
[275] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[276] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[277] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[278] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[279] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[280] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.
[281] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[282] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[283] A. Kiureghian,et al. Aleatory or epistemic? Does it matter? , 2009 .
[284] Changshui Zhang,et al. Transferred Dimensionality Reduction , 2008, ECML/PKDD.
[285] Jiawei Han,et al. Knowledge transfer via multiple model local structure mapping , 2008, KDD.
[286] Qiang Yang,et al. Self-taught clustering , 2008, ICML '08.
[287] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[288] M. D’Esposito. Working memory. , 2008, Handbook of clinical neurology.
[289] Edwin V. Bonilla,et al. Multi-task Gaussian Process Prediction , 2007, NIPS.
[290] Qiang Yang,et al. Co-clustering based classification for out-of-domain documents , 2007, KDD '07.
[291] Raymond J. Mooney,et al. Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.
[292] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.
[293] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[294] Daniel Marcu,et al. Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..
[295] Tong Zhang,et al. A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.
[296] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.
[297] Bianca Zadrozny,et al. Learning and evaluating classifiers under sample selection bias , 2004, ICML.
[298] Neil D. Lawrence,et al. Learning to learn with the informative vector machine , 2004, ICML.
[299] P. Barrouillet,et al. Time constraints and resource sharing in adults' working memory spans. , 2004, Journal of experimental psychology. General.
[300] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[301] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[302] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.
[303] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[304] T. E. Lange,et al. Below the Surface: Analogical Similarity and Retrieval Competition in Reminding , 1994, Cognitive Psychology.
[305] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[306] John Brown. Some Tests of the Decay Theory of Immediate Memory , 1958 .
[307] Wilson L. Taylor,et al. “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .