论文信息 - Distinguishing different types of conference submissions: the ACL case study

Distinguishing different types of conference submissions: the ACL case study

Many conferences in AI and NLP call for long and short papers; and satellite workshops co-locate with the main conference. In this work, we focus on distinguishing full from short from workshop papers, as submitted to some recent ACL conferences. We propose a framework that takes into account both metadata and content of the paper. To extract metadata, we devised a full-fledged paper parser. SVM models outperform the only previously published results by at least 3.6% as concerns distinguishing full from workshop papers. Metadata (number of tables/formulas), syntactic feature (syntactic complexity) and term TF-IDF score distinguish full from short papers, whereas the topic also distinguishes full from workshop papers.

Barbara Di Eugenio | Shuyang Lin | Hong Wang | Clement Yu

[1] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[2] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[3] Martin Chodorow,et al. CriterionSM Online Essay Evaluation: An Application for Automated Evaluation of Student Essays , 2003, IAAI.

[4] David Yarowsky,et al. Stylometric Analysis of Scientific Articles , 2012, NAACL.

[5] Erkki Sutinen,et al. Comparison of Dimension Reduction Methods for Automated Essay Grading , 2008, J. Educ. Technol. Soc..

[6] John M. Swales,et al. Genre Analysis: English in Academic and Research Settings , 1993 .

[7] Dan Klein,et al. Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[8] William Wresch,et al. The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[9] Brendan T. O'Connor,et al. Predicting a Scientific Community’s Response to an Article , 2011, EMNLP.

[10] Daniel Jurafsky,et al. Who should I cite: learning literature search models from citation behavior , 2010, CIKM.

[11] Lisa McGrath,et al. Stance and engagement in pure mathematics research articles: Linking discourse features to disciplinary practices , 2012 .

[12] S. Posteguillo. The Schematic Structure of Computer Science Research Articles , 1999 .

[13] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14] Dragomir R. Radev,et al. Using Citations to Generate surveys of Scientific Paradigms , 2009, NAACL.

[15] Mari Ostendorf,et al. Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[16] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .