Towards a Reliable and Robust Methodology for Crowd-Based Subjective Quality Assessment of Query-Based Extractive Text Summarization
暂无分享,去创建一个
[1] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[2] Anirban Mukhopadhyay,et al. Quality Enhancement by Weighted Rank Aggregation of Crowd Opinion , 2017, ArXiv.
[3] Angela Fan,et al. Controllable Abstractive Summarization , 2017, NMT@ACL.
[4] David G. Rand,et al. The online laboratory: conducting experiments in a real labor market , 2010, ArXiv.
[5] Karen Sparck Jones,et al. Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.
[6] Inderjeet Mani,et al. Summarization Evaluation: An Overview , 2001, NTCIR.
[7] Michael S. Bernstein,et al. The future of crowd work , 2013, CSCW.
[8] Ani Nenkova,et al. Automatic Evaluation of Linguistic Quality in Multi-Document Summarization , 2010, ACL.
[9] Ido Dagan,et al. Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation , 2019, NAACL.
[10] Aniket Kittur,et al. CrowdForge: crowdsourcing complex work , 2011, UIST.
[11] Elena Lloret,et al. The challenging task of summary evaluation: an overview , 2017, Language Resources and Evaluation.
[12] Bob Carpenter,et al. The Benefits of a Model of Annotation , 2013, Transactions of the Association for Computational Linguistics.
[13] Stefanie Nowak,et al. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.
[14] Walt Detmar Meurers,et al. Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation , 2016, LAW@ACL.
[15] Hoa Trang Dang,et al. Overview of DUC 2005 , 2005 .
[16] Lamia Hadrich Belguith,et al. Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries , 2017, J. Comput. Inf. Technol..
[17] Yang Liu,et al. Non-Expert Evaluation of Summarization Systems is Risky , 2010, Mturk@HLT-NAACL.
[18] Aniket Kittur,et al. Crowdsourcing user studies with Mechanical Turk , 2008, CHI.
[19] Iryna Gurevych,et al. APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning , 2018, EMNLP.
[20] Eric SanJuan,et al. Summary Evaluation with and without References , 2010, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..
[21] Chris Callison-Burch,et al. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.
[22] Michael S. Bernstein,et al. Flash Organizations: Crowdsourcing Complex Work by Structuring Crowds As Organizations , 2017, CHI.
[23] Wei Wu,et al. Structuring, Aggregating, and Evaluating Crowdsourced Design Critique , 2015, CSCW.
[24] David E. Irwin,et al. Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.
[25] Karel Jezek,et al. Evaluation Measures for Text Summarization , 2012, Comput. Informatics.
[26] Tim Polzehl,et al. A Crowdsourcing Approach to Evaluate the Quality of Query-based Extractive Text Summaries , 2019, 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX).
[27] Aniket Kittur,et al. Instrumenting the crowd: using implicit behavioral measures to predict task performance , 2011, UIST.
[28] Jeffrey Heer,et al. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks , 2016, CSCW.
[29] Ujwal Gadiraju,et al. It's getting crowded!: improving the effectiveness of microtask crowdsourcing , 2017 .
[30] Javier R. Movellan,et al. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.
[31] Hwee Tou Ng,et al. Automatically Evaluating Text Coherence Using Discourse Relations , 2011, ACL.
[32] Phuoc Tran-Gia,et al. Best Practices for QoE Crowdtesting: QoE Assessment With Crowdsourcing , 2014, IEEE Transactions on Multimedia.
[33] Chris Callison-Burch,et al. Crowd control: Effectively utilizing unscreened crowd workers for biomedical data annotation , 2017, J. Biomed. Informatics.
[34] Eric Gilbert,et al. Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk , 2015, CHI.
[35] Iryna Gurevych,et al. Concept-Map-Based Multi-Document Summarization using Concept Coreference Resolution and Global Importance Optimization , 2017, IJCNLP.
[36] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[37] John M. Conroy,et al. Mind the Gap: Dangers of Divorcing Evaluations of Summary Content from Linguistic Quality , 2008, COLING.
[38] Anirban Mukhopadhyay,et al. A Review of Judgment Analysis Algorithms for Crowdsourced Opinions , 2020, IEEE Transactions on Knowledge and Data Engineering.