Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification
暂无分享,去创建一个
[1] Pengfei Liu,et al. DataLab: A Platform for Data Analysis and Intervention , 2022, ACL.
[2] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[3] Jinlan Fu,et al. XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation , 2021, EMNLP.
[4] Samuel R. Bowman,et al. What Will it Take to Fix Benchmarking in Natural Language Understanding? , 2021, NAACL.
[5] Jinlan Fu,et al. Towards More Fine-grained and Reliable NLP Performance Prediction , 2021, EACL.
[6] Jungo Kasai,et al. GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation , 2021, ArXiv.
[7] Nikhil Ketkar,et al. Convolutional Neural Networks , 2021, Deep Learning with Python.
[8] Yiming Yang,et al. Predicting Performance for Natural Language Processing Tasks , 2020, ACL.
[9] Pengfei Liu,et al. Extractive Summarization as Text Matching , 2020, ACL.
[10] Orhan Firat,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.
[11] Xuanjing Huang,et al. Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study , 2020, AAAI.
[12] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[13] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[15] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[16] Emily M. Bender,et al. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.
[17] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[18] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[19] Tie-Yan Liu,et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.
[20] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[21] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.
[22] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[23] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[24] Frank Hutter,et al. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.
[25] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[26] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[27] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[28] T. Chai,et al. Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature , 2014 .
[29] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[30] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[31] Juliane Fluck,et al. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.
[32] Marc Dymetman,et al. Prediction of Learning Curves in Machine Translation , 2012, ACL.
[33] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[34] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.
[35] Leif E. Peterson. K-nearest neighbor , 2009, Scholarpedia.
[36] Philipp Koehn,et al. Predicting Success in Machine Translation , 2008, EMNLP.
[37] Nello Cristianini,et al. Learning Performance of a Machine Translation System: a Statistical and Computational Analysis , 2008, WMT@ACL.
[38] Xiao Chen,et al. The Fourth International Chinese Language Processing Bakeoff: Chinese Word Segmentation, Named Entity Recognition and Chinese POS Tagging , 2008, IJCNLP.
[39] Filip Radlinski,et al. A support vector method for optimizing average precision , 2007, SIGIR.
[40] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[41] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.
[42] Daniel Jurafsky,et al. A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.
[43] Bing Liu,et al. Mining and summarizing customer reviews , 2004, KDD.
[44] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.
[45] Johan A. K. Suykens,et al. Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.
[46] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.
[47] Dan Roth,et al. Learning Question Classifiers , 2002, COLING.
[48] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .
[49] Jaana Kekäläinen,et al. IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.
[50] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[51] Vladimir Vapnik,et al. The Support Vector Method , 1997, ICANN.
[52] George R. Doddington,et al. The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.
[53] J. R. Quinlan. Probabilistic decision trees , 1990 .
[54] B. Richards. Type/Token Ratios: what do they really tell us? , 1987, Journal of Child Language.
[55] D. Cox,et al. Statistical significance tests. , 1982, British journal of clinical pharmacology.
[56] J. H. Zar,et al. Significance Testing of the Spearman Rank Correlation Coefficient , 1972 .