Exploring the Feasibility of Automatically Rating Online Article Quality

We demonstrate the feasibility of building an automatic system to assign quality ratings to articles in Wikipedia, the online encyclopedia. Our preliminary system uses a Maximum Entropy classification model trained on articles handtagged for quality by humans. This simple system demonstrates extremely good results, with significant avenues of improvement still to explore.

[1]  Paul T. Durbin,et al.  Zen and the Art of Motorcycle Maintenance: An Inquiry into Values , 1977 .

[2]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[3]  L. Faigley,et al.  Coherence, Cohesion, and Writing Quality , 1981, College Composition & Communication.

[4]  P. S. Gingrich,et al.  The writer's workbench: Computer aids for text analysis , 1982 .

[5]  Brian R. Gaines An Ounce of Knowledge is Worth a Ton of Data: Quantitative studies of the Trade-Off between Expertise and Data Based On Statistically Well-Founded Empirical Induction , 1989, ML.

[6]  Robert L. Bangert-Drowns,et al.  The Word Processor as an Instructional Tool: A Meta-Analysis of Word Processing in Writing Instruction , 1993 .

[7]  Marti A. Hearst TextTiling: A Quantitative Approach to Discourse , 1993 .

[8]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[9]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[10]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[11]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[13]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Peter W. Foltz,et al.  Supporting Content-Based Feedback in On-Line Writing Evaluation with LSA , 2000, Interact. Learn. Environ..

[16]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[17]  Foster Provost,et al.  The effect of class distribution on classifier learning , 2001 .

[18]  Zhang Le,et al.  Maximum Entropy Modeling Toolkit for Python and C , 2004 .

[19]  Karen Kukich,et al.  Evaluation of text coherence for electronic essay scoring systems , 2004, Natural Language Engineering.

[20]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[21]  Simone Paolo Ponzetto Creating a Knowledge Base from a Collaboratively Generated Encyclopedia , 2007, HLT-NAACL.

[22]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.