Distinguishing different types of conference submissions: the ACL case study

Many conferences in AI and NLP call for long and short papers; and satellite workshops co-locate with the main conference. In this work, we focus on distinguishing full from short from workshop papers, as submitted to some recent ACL conferences. We propose a framework that takes into account both metadata and content of the paper. To extract metadata, we devised a full-fledged paper parser. SVM models outperform the only previously published results by at least 3.6% as concerns distinguishing full from workshop papers. Metadata (number of tables/formulas), syntactic feature (syntactic complexity) and term TF-IDF score distinguish full from short papers, whereas the topic also distinguishes full from workshop papers.

[1]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[2]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[3]  Martin Chodorow,et al.  CriterionSM Online Essay Evaluation: An Application for Automated Evaluation of Student Essays , 2003, IAAI.

[4]  David Yarowsky,et al.  Stylometric Analysis of Scientific Articles , 2012, NAACL.

[5]  Erkki Sutinen,et al.  Comparison of Dimension Reduction Methods for Automated Essay Grading , 2008, J. Educ. Technol. Soc..

[6]  John M. Swales,et al.  Genre Analysis: English in Academic and Research Settings , 1993 .

[7]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[8]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[9]  Brendan T. O'Connor,et al.  Predicting a Scientific Community’s Response to an Article , 2011, EMNLP.

[10]  Daniel Jurafsky,et al.  Who should I cite: learning literature search models from citation behavior , 2010, CIKM.

[11]  Lisa McGrath,et al.  Stance and engagement in pure mathematics research articles: Linking discourse features to disciplinary practices , 2012 .

[12]  S. Posteguillo The Schematic Structure of Computer Science Research Articles , 1999 .

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Dragomir R. Radev,et al.  Using Citations to Generate surveys of Scientific Paradigms , 2009, NAACL.

[15]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[16]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .