Crowdsourced Comprehension: Predicting Prerequisite Structure in Wikipedia

The growth of open-access technical publications and other open-domain textual information sources means that there is an increasing amount of online technical material that is in principle available to all, but in practice, incomprehensible to most. We propose to address the task of helping readers comprehend complex technical material, by using statistical methods to model the "prerequisite structure" of a corpus --- i.e., the semantic impact of documents on an individual reader's state of knowledge. Experimental results using Wikipedia as the corpus suggest that this task can be approached by crowd-sourcing the production of ground-truth labels regarding prerequisite structure, and then generalizing these labels using a learned classifier which combines signals of various sorts. The features that we consider relate pairs of pages by analyzing not only textual features of the pages, but also how the containing corpora is connected and created.

[1]  Tristan Nixon,et al.  A Method for Finding Prerequisites Within a Curriculum , 2011, EDM.

[2]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[3]  Kellogg S. Booth,et al.  Co-authoring with structured annotations , 2006, CHI.

[4]  Kenneth R. Koedinger,et al.  Using Item-type Performance Covariance to Improve the Skill Model of an Existing Tutor , 2008, EDM.

[5]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[6]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[7]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[8]  J. Fleiss,et al.  The measurement of interrater agreement , 2004 .

[9]  Oren Etzioni,et al.  Machine Reading , 2006, AAAI.

[10]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[11]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[12]  Nitish Srivastava,et al.  Enriching textbooks through data mining , 2010, ACM DEV '10.

[13]  Jure Leskovec,et al.  Governance in Social Media: A Case Study of the Wikipedia Promotion Process , 2010, ICWSM.

[14]  Georgios Paliouras,et al.  Web Usage Mining as a Tool for Personalization: A Survey , 2003, User Modeling and User-Adapted Interaction.

[15]  J. Barlow The Shallows: What the Internet is Doing to Our Brains , 2010 .

[16]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[17]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[18]  Sabine Graf,et al.  Adaptable and Adaptive Hypermedia Systems , 2006, J. Educ. Technol. Soc..

[19]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[20]  SYLVIE NOËL,et al.  Empirical Study on Collaborative Writing: What Do Co-authors Do, Use, and Like? , 2004, Computer Supported Cooperative Work (CSCW).

[21]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.