Critique Style Guide: Improving Crowdsourced Design Feedback with a Natural Language Model

Designers are increasingly leveraging online crowds; yet, online contributors may lack the expertise, context, and sensitivity to provide effective critique. Rubrics help feedback providers but require domain experts to write them and may not generalize across design domains. This paper introduces and tests a novel semi-automated method to support feedback providers by analyzing feedback language. In our first study, 52 students from two design courses created design solutions and received feedback from 176 online providers. Instructors, students, and crowd contributors rated the helpfulness of each feedback response. From this data, an algorithm extracted a set of natural language features (e.g., specificity, sentiment etc.) that correlated with the ratings. The features accurately predicted the ratings and remained stable across different raters and design solutions. Based on these features, we produced a critique style guide with feedback examples - automatically selected for each feature - to help providers revise their feedback through self-assessment. In a second study, we tested the validity of the guide through a between-subjects experiment (n=50). Providers wrote feedback on design solutions with or without the guide. Providers generated feedback with higher perceived helpfulness when using our style-based guidance.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  G. Norman Likert scales, levels of measurement and the “laws” of statistics , 2010, Advances in health sciences education : theory and practice.

[3]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[4]  Scott R. Klemmer,et al.  The efficacy of prototyping under time constraints , 2009, C&C '09.

[5]  Nina Runge,et al.  Predicting Crowd-Based Translation Quality with Language-Independent Feature Vectors , 2012, HCOMP@AAAI.

[6]  Ben Hamner,et al.  Contrasting state-of-the-art automated scoring of essays: analysis , 2012 .

[7]  Christian D. Schunn,et al.  The Reliability and Validity of Peer Review of Writing in High School AP English Classes. , 2016 .

[8]  Klaus Krippendorff,et al.  Estimating the Reliability, Systematic Error and Random Error of Interval Data , 1970 .

[9]  Jennifer Marlow,et al.  From rookie to all-star: professional development in a graphic design social networking site , 2014, CSCW.

[10]  Donald D. Chinn Peer assessment in the algorithms course , 2005, ITiCSE '05.

[11]  Diane J. Litman,et al.  Understanding Differences in Perceived Peer-Review Helpfulness using Natural Language Processing , 2011, BEA@ACL.

[12]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[13]  Justin Cheng,et al.  Peer and self assessment in massive online classes , 2013, ACM Trans. Comput. Hum. Interact..

[14]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[15]  Brian P. Bailey,et al.  A Classroom Study of Using Crowd Feedback in the Iterative Design Process , 2015, CSCW.

[16]  Brian P. Bailey,et al.  What do you think?: a case study of benefit, expectation, and interaction in a large online critique community , 2012, CSCW.

[17]  Wei Wu,et al.  Structuring, Aggregating, and Evaluating Crowdsourced Design Critique , 2015, CSCW.

[18]  Anne Venables,et al.  Enhancing scientific essay writing using peer assessment , 2003 .

[19]  Mary L. Rucker,et al.  Assessing Student Learning Outcomes: An Investigation of the Relationship among Feedback Measures , 2003 .

[20]  Aaron D. Shaw,et al.  Designing incentives for inexpert human raters , 2011, CSCW.

[21]  Peter W. Foltz,et al.  The Debate on Automated Essay Grading , 2000, IEEE Intell. Syst..

[22]  Mark S. Ackerman,et al.  Contribution, commercialization & audience: understanding participation in an online creative community , 2009, GROUP.

[23]  C. Schunn,et al.  Commenting on Writing , 2006 .

[24]  C. Martin 2015 , 2015, Les 25 ans de l’OMC: Une rétrospective en photos.

[25]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[26]  Loren Olson,et al.  CritViz: Web-Based Software Supporting Peer Critique in Large Creative Classrooms , 2013 .

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Brian P. Bailey,et al.  Voyant: generating structured feedback on visual designs using a crowd of non-experts , 2014, CSCW.

[29]  Björn Hartmann,et al.  Almost an Expert: The Effects of Rubrics and Expertise on Perceived Value of Crowdsourced Design Critiques , 2016, CSCW.

[30]  Scott R. Klemmer,et al.  Shepherding the crowd yields better work , 2012, CSCW.

[31]  Salvatore Valenti,et al.  An Overview of Current Research on Automated Essay Grading , 2003, J. Inf. Technol. Educ..

[32]  Christian D. Schunn,et al.  The effects of skill diversity on commenting and revisions , 2013 .

[33]  Christian D. Schunn,et al.  Natural Language Processing techniques for researching and improving peer feedback , 2012 .

[34]  Elizabeth Gerber,et al.  A pilot study of using crowds in the classroom , 2013, CHI.

[35]  D. Sadler Formative assessment and the design of instructional systems , 1989 .

[36]  Daniel L. Schwartz,et al.  Prototyping dynamics: sharing multiple designs improves exploration, group rapport, and results , 2011, CHI.

[37]  Glenn Gamst,et al.  Applied Multivariate Research: Design and Interpretation , 2005 .

[38]  Markus Krause Stylometry-based Fraud and Plagiarism Detection for Learning at Scale , 2015 .

[39]  Scott R. Klemmer,et al.  How bodies matter: five themes for interaction design , 2006, DIS '06.

[40]  Edmund Burke Feldman,et al.  Practical Art Criticism , 1994 .

[41]  Zhenghao Chen,et al.  Tuned Models of Peer Assessment in MOOCs , 2013, EDM.

[42]  Markus Krause A behavioral biometrics based authentication method for MOOC's that is robust against imitation attempts , 2014, L@S.

[43]  E. M. Cramer,et al.  Chapter VIII: Multivariate Analysis , 1966 .

[44]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[45]  Lorrie Faith Cranor,et al.  Are your participants gaming the system?: screening mechanical turk workers , 2010, CHI.

[46]  Christian D. Schunn,et al.  Understanding the benefits of providing peer feedback: how students respond to peers’ texts of varying quality , 2015 .

[47]  Abigail Sellen,et al.  Getting the right design and the design right , 2006, CHI.

[48]  Michael S. Bernstein,et al.  PeerStudio: Rapid Peer Feedback Emphasizes Revision and Improves Performance , 2015, L@S.

[49]  Scott R. Klemmer,et al.  Framing Feedback: Choosing Review Environment Features that Support High Quality Peer Assessment , 2016, CHI.

[50]  P. Brooks,et al.  The Relationship between Achievement Goal Orientation and Coping Style: Traditional vs. Nontraditional College Students , 2003 .

[51]  Gerhard Fischer,et al.  Embedding computer-based critics in the contexts of design , 1993, INTERCHI.

[52]  Elizabeth Gerber,et al.  Critiki: A Scaffolded Approach to Gathering Design Feedback from Paid Crowdworkers , 2015, Creativity & Cognition.

[53]  Douglas Neale,et al.  Assessment focus in studio: What is most prominent in architecture, art and design? , 2009 .