Document Segmentation for Labeling with Academic Learning Objectives

Teaching in formal academic environments typically follows a curriculum that specifies learning objectives that need to be met at each phase of a student’s academic progression. In this paper, we address the novel task of identifying document segments in educational material that are relevant for different learning objectives. Using a dynamic programming algorithm based on a vector space representation of sentences in a document, we automatically segment and then label document segments with learning objectives. We demonstrate the effectiveness of our approach on a real-world education data set. We further demonstrate how our system is useful for related tasks of document passage retrieval and QA using a large publicly available dataset. To the best of our knowledge we are the first to attempt the task of segmenting and labeling education materials with academic learning objectives.

[1]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[2]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[3]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[4]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[5]  Jian-Ping Mei,et al.  SumCR: A new subtopic-based extractive approach for text summarization , 2012, Knowledge and Information Systems.

[6]  Qin Lu,et al.  Applying regression models to query-focused multi-document summarization , 2011, Inf. Process. Manag..

[7]  Chris Biemann,et al.  Text Segmentation with Topic Models , 2012, Journal for Language Technology and Computational Linguistics.

[8]  W. Marsden I and J , 2012 .

[9]  Marti A. Hearst Text tiling: A quantitative approach to discourse segmentation , 1993, ACL 1993.

[10]  Alexander A. Alemi,et al.  Text Segmentation based on Semantic Word Embeddings , 2015, ArXiv.

[11]  Mukesh K. Mohania,et al.  Labeling Educational Content with Academic Learning Standards , 2015, SDM.

[12]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[13]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[14]  Lan Du,et al.  Topic Segmentation with an Ordering-Based Topic Model , 2015, AAAI.

[15]  Charles L. A. Clarke,et al.  Passage retrieval vs. document retrieval for factoid question answering , 2003, SIGIR.