Data mining for improving textbooks

We present our early explorations into developing a data mining based approach for enhancing the quality of textbooks. We describe a diagnostic tool to algorithmically identify deficient sections in textbooks. We also discuss techniques for algorithmically augmenting textbook sections with links to selective content mined from the Web. Our evaluation, employing widely-used textbooks from India, indicates that developing technological approaches to help improve textbooks holds promise.

[1]  William Anderson McCall,et al.  Standard Test Lessons in Reaping , 1925, Teachers College Record: The Voice of Scholarship in Education.

[2]  Bernice E. Leary,et al.  What makes a book readable , 1935 .

[3]  J. Chall,et al.  A FORMULA FOR PREDICTING READABILITY , 1948 .

[4]  I E Fang By computer Flesch's: reading ease score and a syllable counter. , 1968, Behavioral science.

[5]  E. U. Coke,et al.  Note on a simple algorithm for a computer-produced reading ease score. , 1970 .

[6]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[7]  David E. Kieras,et al.  Rules for Comprehensible Technical Prose: A Survey of the Psycholinguistic Literature. , 1985 .

[8]  Editors , 1986, Brain Research Bulletin.

[9]  David L. Elliott,et al.  Textbooks in School and Society: An Annotated Bibliography & Guide to Research , 1988 .

[10]  Adriaan Kin Bing Wu Population and Hum Verspoor Textbooks and educational development , 1990 .

[11]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[12]  Egil Børre Johnsen Textbooks in the Kaleidoscope: A Critical Survey of Literature and Research on Educational Texts , 1993 .

[13]  K. Bakewell Research in indexing: more needed? , 1993 .

[14]  J. Moulton How Do Teachers Use Textbooks and Other Print Materials? A Review of the Literature , 1994 .

[15]  Nancy C. Mulvany,et al.  Indexing Books , 1994 .

[16]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[17]  Sue Ellen Wright,et al.  Handbook of terminology management. , 2001 .

[18]  Ramakrishnan Srikant,et al.  Discovering Trends in Text Databases , 1997, KDD.

[19]  Robert C. Calfee,et al.  Textbooks for learning : nurturing children's minds , 1998 .

[20]  Bruce W Speck,et al.  Collaborative Writing: An Annotated Bibliography , 1999 .

[21]  Daniel Marcu,et al.  Discourse Trees Are Good Indicators of Importance in Text , 1999 .

[22]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[23]  James D. Anderson,et al.  The nature of indexing: how humans and machines analyze messages and texts for retrieval - Part I: Research, and the nature of human indexing , 2001, Inf. Process. Manag..

[24]  James D. Anderson,et al.  The nature of indexing: how humans and machines analyze messages and texts for retrieval - Part II: Machine indexing, and the allocation of human versus machine effort , 2001, Inf. Process. Manag..

[25]  D. Saari Decisions and elections : explaining the unexpected , 2001 .

[26]  Sue Ellen Wright,et al.  Handbook of Terminology Management: Volume 2: Application-Oriented Terminology Management , 2001 .

[27]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[28]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[29]  William H. DuBay The Principles of Readability. , 2004 .

[30]  Philippe Balbiani,et al.  Formal Concept Analysis, Foundations and Applications , 2005 .

[31]  Sreenivas Gollapudi,et al.  Exploiting asymmetry in hierarchical topic extraction , 2006, CIKM '06.

[32]  P. Glewwe,et al.  Many Children Left Behind? Textbooks and Test Scores in Kenya , 2007 .

[33]  Rada Mihalcea,et al.  Linking Educational Materials to Encyclopedic Knowledge , 2007, AIED.

[34]  E. Hanushek,et al.  The Role of Education Quality for Economic Growth , 2007 .

[35]  Ee-Peng Lim,et al.  Measuring article quality in wikipedia: models and evaluation , 2007, CIKM '07.

[36]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[37]  J. Gillies,et al.  Opportunity to Learn: A High Impact Strategy for Improving Educational Outcomes in Developing Countries. Working Paper. , 2008 .

[38]  Larry Downes,et al.  The Laws of Disruption: Harnessing the New Forces that Govern Life and Business in the Digital Age , 2009 .

[39]  Kentaro Toyama,et al.  Effects of integrating digital visual materials with textbook scans in the classroom , 2009 .

[40]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[41]  Vasudeva Varma,et al.  SMEO: A Platform for Smart Classrooms with Enhanced Information Access and Operations Automation , 2010, NEW2AN.

[42]  Nitish Srivastava,et al.  Enriching textbooks through data mining , 2010, ACM DEV '10.

[43]  W. Bruce Croft,et al.  Improving verbose queries using subset distribution , 2010, CIKM.

[44]  William Thies,et al.  Interactive DVDs as a platform for education , 2010, ICTD 2010.

[45]  W. Bruce Croft,et al.  Evaluating verbose query processing techniques , 2010, SIGIR.

[46]  Xiaolong Li,et al.  An Overview of Microsoft Web N-gram Corpus and Applications , 2010, NAACL.

[47]  Sreenivas Gollapudi,et al.  Identifying enrichment candidates in textbooks , 2011, WWW.

[48]  Sreenivas Gollapudi,et al.  Enriching textbooks with images , 2011, CIKM '11.

[49]  Julie Coiro,et al.  Handbook of Research on New Literacies , 2014 .