Wise Crowd Content Assessment and Educational Rubrics

Development of reliable rubrics for educational intervention studies that address reading and writing skills is labor-intensive, and could benefit from an automated approach. We compare a main ideas rubric used in a successful writing intervention study to a highly reliable wise-crowd content assessment method developed to evaluate machine-generated summaries. The ideas in the educational rubric were extracted from a source text that students were asked to summarize. The wise-crowd content assessment model is derived from summaries written by an independent group of proficient students who read the same source text, and followed the same instructions to write their summaries. The resulting content model includes a ranking over the derived content units. All main ideas in the rubric appear prominently in the wise-crowd content model. We present two methods that automate the content assessment. Scores based on the wise-crowd content assessment, both manual and automated, have high correlations with the main ideas rubric. The automated content assessment methods have several advantages over related methods, including high correlations with corresponding manual scores, a need for only half a dozen models instead of hundreds, and interpretable scores that independently assess content quality and coverage.

[1]  Qian Yang,et al.  PEAK: Pyramid Evaluation via Automated Knowledge Extraction , 2016, AAAI.

[2]  Brian J. Reiser,et al.  Engaging Students in the Scientific Practices of Explanation and Argumentation: Understanding a Framework for K-12 Science Education , 2012 .

[3]  Libby Gerard,et al.  Automated Scoring of Constructed-Response Science Items: Prospects and Obstacles , 2014, Educational Measurement: Issues and Practice.

[4]  Bob Carpenter,et al.  The Benefits of a Model of Annotation , 2013, Transactions of the Association for Computational Linguistics.

[5]  Walter Kintsch,et al.  Cognitive Psychology and Discourse: Recalling and Summarizing Stories , 1978 .

[6]  Steve Graham,et al.  Teaching writing to middle school students: a national survey , 2013, Reading and Writing.

[7]  Larry D. Yore,et al.  Scientists' views of science, models of writing, and science writing practices , 2004 .

[8]  W. Kintsch,et al.  Strategies of discourse comprehension , 1983 .

[9]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[10]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[11]  Lawrence M. Rudner,et al.  An Evaluation of IntelliMetric™ Essay Scoring System , 2006 .

[12]  Carolyn Penstein Rosé,et al.  The Architecture of Why2-Atlas: A Coach for Qualitative Physics Essay Writing , 2002, Intelligent Tutoring Systems.

[13]  Beata Beigman Klebanov,et al.  Content Importance Models for Scoring Writing From Sources , 2014, ACL.

[14]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[15]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[16]  W. Nagy,et al.  Syntactic complexity as a predictor of adolescent writing quality: Which measures? Which genre? , 2009 .

[17]  Hans van Halteren,et al.  Evaluating Information Content by Factoid Analysis: Human annotation and stability , 2004, EMNLP.

[18]  Carol Westby,et al.  Summarizing Expository Texts , 2010 .

[19]  J. Burstein,et al.  Informing Automated Writing Evaluation Using the Lens of Genre: Two Studies. , 2015 .

[20]  Paul Deane,et al.  COGNITIVE MODELS OF WRITING: WRITING PROFICIENCY AS A COMPLEX INTEGRATED SKILL , 2008 .

[21]  K. Krippendorff Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .

[22]  William E. Nagy,et al.  Writing development in four genres from grades three to seven: syntactic complexity and genre differentiation , 2011 .

[23]  Danielle S. McNamara,et al.  Computer-based scaffolding to facilitate students' development of expertise in academic writing , 2012 .

[24]  Stephen Graham,et al.  High school teachers use of writing to support students’ learning: a national survey , 2014 .

[25]  Rebecca J. Passonneau,et al.  Annotation of Children's Oral Narrations: Modeling Emergent Narrative Skills for Computational Applications , 2007, FLAIRS.

[26]  Ronald E. Johnson Recall of prose as a function of the structural importance of the linguistic units. , 1970 .

[27]  Christiane Fellbaum,et al.  The MASC Word Sense Corpus , 2012, LREC.

[28]  Stephen P. Norris,et al.  How literacy in its fundamental sense is central to scientific literacy , 2003 .

[29]  Jennifer Wiley,et al.  The Effects of ‘Playing Historian’ on Learning in History , 1996 .

[30]  Smaranda Muresan,et al.  Surprisal as a Predictor of Essay Quality , 2014, BEA@ACL.

[31]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[32]  James S. Braswell,et al.  The Nation's Report Card[TM]: Mathematics, 2003. NCES 2005-451. , 2005 .

[33]  Rebecca J. Passonneau,et al.  Formal and functional assessment of the pyramid method for summary content evaluation* , 2009, Natural Language Engineering.

[34]  R. Bangert-Drowns,et al.  The Effects of School-Based Writing-to-Learn Interventions on Academic Achievement: A Meta-Analysis , 2004 .

[35]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[36]  Rebecca J. Passonneau,et al.  Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation , 2006, LREC.

[37]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[38]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[39]  Koichi Yamazaki,et al.  A note on greedy algorithms for the maximum weighted independent set problem , 2003, Discret. Appl. Math..

[40]  S. Graham,et al.  A meta-analysis of writing instruction for adolescent students. , 2007 .

[41]  J. Miller,et al.  The Nation's Report Card[TM]: Writing 2007. National Assessment of Educational Progress at Grades 8 and 12. National, State, and Trial Urban District Results. NCES 2008-468. , 2008 .

[42]  Laura K. Allen,et al.  The Writing Pal Intelligent Tutoring System: Usability Testing and Development , 2014 .

[43]  Rod D. Roscoe,et al.  Writing pal: Feasibility of an intelligent writing strategy tutor in the high school classroom , 2013 .

[44]  Ruth Garner,et al.  Text Summarization Deficiencies Among Older Students: Awareness or Production Ability? , 1985 .

[45]  Joseph Magliano,et al.  Automated Approaches for Detecting Integration in Student Essays , 2012, ITS.

[46]  Ani Nenkova,et al.  Automatically Assessing Machine Summary Content Without a Gold Standard , 2013, CL.

[47]  Laura K. Allen,et al.  Automated Detection of Essay Revising Patterns: Applications for Intelligent Feedback in a Writing Tutor. , 2015 .

[48]  T. Jong,et al.  The effects of a concept map-based support tool on simulation-based inquiry learning , 2013 .

[49]  Anna N. Rafferty,et al.  Automated guidance for student inquiry. , 2016 .

[50]  Carolyn Penstein Rosé,et al.  An Evaluation of a Hybrid Language Understanding Approach for Robust Selection of Tutoring Goals , 2002, Int. J. Artif. Intell. Educ..

[51]  Ann L. Brown,et al.  The Development of Plans for Summarizing Texts , 1983 .

[52]  Eric SanJuan,et al.  Multilingual Summarization Evaluation without Human Models , 2010, COLING.

[53]  Kathleen R. McKeown,et al.  Applying the Pyramid Method in DUC 2005 , 2005 .

[54]  Martin Chodorow,et al.  C-rater: Automated Scoring of Short-Answer Questions , 2003, Comput. Humanit..

[55]  Katherine L. McNeill,et al.  A learning progression for scientific argumentation: Understanding student work and designing supportive instructional contexts , 2010 .

[56]  Helge I. Strømsø,et al.  Understanding and Integrating Multiple Science Texts: Summary Tasks are Sometimes Better Than Argument Tasks , 2010 .

[57]  Jill Burstein,et al.  Handbook of Automated Essay Evaluation Current Applications and New Directions , 2018 .

[58]  Perry D. Klein,et al.  Teaching Argument and Explanation to Prepare Junior Students for Writing to Learn , 2010 .

[59]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[60]  Paul Deane,et al.  On the relation between automated essay scoring and modern views of the writing construct , 2013 .

[61]  Steve Graham,et al.  Writing Next: Effective Strategies to Improve Writing of Adolescents in Middle and High Schools. A Report to Carnegie Corporation of New York. , 2007 .

[62]  Weiwei Guo,et al.  Automated Pyramid Scoring of Summaries using Distributional Semantics , 2013, ACL.

[63]  Roberto Navigli,et al.  Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity , 2013, ACL.

[64]  Jeanne D. Day,et al.  Teaching Summarization Skills: Influences of Student Ability Level and Strategy Difficulty , 1986 .

[65]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[66]  Ann L. Brown,et al.  Macrorules for summarizing texts: the development of expertise , 1983 .

[67]  Michael Vitale,et al.  The Wisdom of Crowds , 2015, Cell.

[68]  R. T. Kellogg,et al.  Training writing skills: A cognitive developmental perspective , 2008 .

[69]  Stephen Graham,et al.  The Relationship of Discourse and Topic Knowledge to Fifth Graders' Writing Performance. , 2014 .

[70]  Jay R. Campbell,et al.  The Nation's Report Card: Reading, 2002. , 2003 .

[71]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[72]  A. Graesser,et al.  Improving an intelligent tutor ’ s comprehension of students with Latent Semantic Analysis ∗ , 1999 .

[73]  P. Donahue,et al.  The Nation's Report Card[TM]: Reading, 2003. NCES 2005-453. , 2005 .

[74]  E. B. Page Computer Grading of Student Prose, Using Modern Concepts and Software , 1994 .

[75]  Walter Kintsch,et al.  Support of Content and Rhetorical Processes of Writing: Effects on the Writing Process and the Written Product , 2001 .

[76]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[77]  Linda H. Mason,et al.  A Contextualized Curricular Supplement for Developmental Reading and Writing , 2013 .

[78]  M. Daane,et al.  The Nation's Report Card: Writing, 2002. , 2003 .

[79]  Martin Chodorow,et al.  Enriching Automated Essay Scoring Using Discourse Marking , 2001 .

[80]  Arthur C. Graesser,et al.  Strategic processing during comprehension. , 1999 .

[81]  Kathleen R. McKeown,et al.  Applying the Pyramid Method in the 2006 Document Understanding Conference , 2006 .

[82]  Peter W. Foltz,et al.  The intelligent essay assessor: Applications to educational technology , 1999 .

[83]  Christopher Mazzeo,et al.  Building Bridges to College and Careers: Contextualized Basic Skills Programs at Community Colleges. , 2003 .

[84]  Ani Nenkova,et al.  Automatically Evaluating Content Selection in Summarization without Human Models , 2009, EMNLP.

[85]  Linda Froschauer,et al.  Skateboarding to Understanding , 2011 .

[86]  E. B. Page,et al.  The use of the computer in analyzing student essays , 1968 .

[87]  Weiwei Guo,et al.  Modeling Sentences in the Latent Space , 2012, ACL.

[88]  Dragomir R. Radev,et al.  A Computational Analysis of Collective Discourse , 2012, ArXiv.

[89]  Shelbie D. Witte,et al.  Writing to Learn by Learning to Write during the School Science Laboratory: Helping Middle and High School Students Develop Argumentative Writing Skills as They Learn Core Ideas. , 2013 .

[90]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[91]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[92]  D. Cicchetti Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. , 1994 .

[93]  Vaughan Prain,et al.  Exploring students' responses to conceptual questions when engaged with planned writing experiences: A study with year 10 science students , 2004 .

[94]  Natalie G. Olinghouse,et al.  The relationship between vocabulary and writing quality in three genres , 2012, Reading and Writing.