Automatic Metrics for Genre-specific Text Quality

To date, researchers have proposed different ways to compute the readability and coherence of a text using a variety of lexical, syntax, entity and discourse properties. But these metrics have not been defined with special relevance to any particular genre but rather proposed as general indicators of writing quality. In this thesis, we propose and evaluate novel text quality metrics that utilize the unique properties of different genres. We focus on three genres: academic publications, news articles about science, and machine generated text, in particular the output from automatic text summarization systems.

[1]  Lijun Feng,et al.  Cognitively Motivated Features for Readability Assessment , 2009, EACL.

[2]  R. Gunning The Technique of Clear Writing. , 1968 .

[3]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[4]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[5]  Joel R. Tetreault,et al.  Using Entity-Based Features to Model Coherence in Student Essays , 2010, HLT-NAACL.

[6]  Ani Nenkova,et al.  Automatic identification of general and specific sentences by leveraging discourse annotations , 2011, IJCNLP.

[7]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[8]  Ani Nenkova,et al.  A Coherence Model Based on Syntactic Patterns , 2012, EMNLP.

[9]  Ani Nenkova,et al.  Automatic Evaluation of Linguistic Quality in Multi-Document Summarization , 2010, ACL.

[10]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[11]  J. Chall,et al.  A FORMULA FOR PREDICTING READABILITY , 1948 .

[12]  Ani Nenkova,et al.  A corpus of general and specific sentences from news , 2012, LREC.

[13]  Daniel Marcu,et al.  Discourse Generation Using Utility-Trained Coherence Models , 2006, ACL.

[14]  Jin Zhao,et al.  Domain-specific iterative readability computation , 2010, JCDL '10.

[15]  Ani Nenkova,et al.  Can You Summarize This? Identifying Correlates of Input Difficulty for Multi-Document Summarization , 2008, ACL.

[16]  Adam Kilgarriff,et al.  Helping Our Own: Text Massaging for Computational Linguistics as a New Shared Task , 2010, INLG.

[17]  Mirella Lapata,et al.  Learning to Tell Tales: A Data-driven Approach to Story Generation , 2009, ACL.

[18]  Marc Moens,et al.  What's Yours and What's Mine: Determining Intellectual Attribution in Scientific Text , 2000, EMNLP.

[19]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[20]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[21]  Ani Nenkova,et al.  Performance Confidence Estimation for Automatic Summarization , 2009, EACL.

[22]  Ani Nenkova,et al.  Text Specificity and Impact on Quality of News Summaries , 2011, Monolingual@ACL.

[23]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[24]  Hwee Tou Ng,et al.  Automatically Evaluating Text Coherence Using Discourse Relations , 2011, ACL.

[25]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[26]  Luo Si,et al.  A statistical model for scientific readability , 2001, CIKM '01.