论文信息 - Annotation and Classification of Sentence-level Revision Improvement - 字舞流文

Annotation and Classification of Sentence-level Revision Improvement

Studies of writing revisions rarely focus on revision quality. To address this issue, we introduce a corpus of between-draft revisions of student argumentative essays, annotated as to whether each revision improves essay quality. We demonstrate a potential usage of our annotations by developing a machine learning model to predict revision improvement. With the goal of expanding training data, we also extract revisions from a dataset edited by expert proofreaders. Our results indicate that blending expert and non-expert revisions increases model performance, with expert data particularly important for predicting low-quality revisions.

Diane J. Litman | Tazin Afrin | D. Litman | T. Afrin

[1] Hwee Tou Ng,et al. Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[2] Fan Zhang,et al. A Corpus of Annotated Revisions for Studying Argumentative Writing , 2017, ACL.

[3] Fan Zhang,et al. Annotation and Classification of Argumentative Writing Revisions , 2015, BEA@NAACL-HLT.

[4] Trena M. Paulus,et al. The Effect of Peer and Teacher Feedback on Student Writing , 1999 .

[5] Rafael E. Banchs,et al. A Report on the Automatic Evaluation of Scientific Writing Shared Task , 2016, BEA@NAACL-HLT.

[6] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[7] Iryna Gurevych,et al. A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles , 2012, COLING.

[8] Michael Strube,et al. Feature-Rich Error Detection in Scientific Writing Using Logistic Regression , 2016, BEA@NAACL-HLT.

[9] Rebecca Hwa,et al. Improved Correction Detection in Revised ESL Sentences , 2014, ACL.

[10] Junyi Jessy Li,et al. Fast and Accurate Prediction of Sentence Specificity , 2015, AAAI.

[11] Christof Monz,et al. User Edits Classification Using Document Revision Histories , 2012, EACL.

[12] Paolo Rosso,et al. Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features , 2011, CICLing.

[13] Lillian Lee,et al. A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication , 2014, ACL.

[14] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.

[15] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[16] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17] Iryna Gurevych,et al. Automatically Classifying Edit Categories in Wikipedia Revisions , 2013, EMNLP.

[18] Ted Briscoe,et al. Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments , 2016, COLING.

[19] Cristina V. Lopes,et al. Vandalism detection in Wikipedia: a high-performing, feature-rich model and its reduction through Lasso , 2011, Int. Sym. Wikis.

[20] Aaron Halfaker,et al. Identifying Semantic Edit Intentions from Revisions in Wikipedia , 2017, EMNLP.