Mining multiple informational text structure from text data

Abstract This study aimed to distinguish the various types of informational text structure present in the text data. Classification of informational text structure in a given text is an essential area of research for discovering knowledge present in the text content. Several previous studies defined a set of categories of informational text structure which can be identified based on their respective signal words. The paper proposed a methodology for automatic extraction of those text informational structures from school textbook data. The task was to classify a text into one or more of the given predefined categories. Human annotators have performed the categorization, who have sufficient domain knowledge about the subjects of the book. For automatic classification, the occurrence frequency of the signal words was used as a feature vector. A Naive Bayes based classifier was trained using 120 manually annotated text data. Forty text data was used to test the classifier. The classifier had a precision rate of 92% and F1 score of 95.6%.

[1]  Nicholas J. Belkin,et al.  Interaction with Texts: Information Retrieval as Information-Seeking Behavior , 1993, Information Retrieval.

[2]  Arun D Panicker,et al.  Question Classification using Machine Learning Approaches , 2012 .

[3]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[4]  Susan R. Goldman,et al.  Structural Aspects of Constructing Meaning From Text , 2000 .

[5]  Arshad Abd Samad,et al.  How to Teach Expository Text Structure to Facilitate Reading Comprehension , 2011 .

[6]  Saket S. R. Mengle,et al.  Ambiguity measure feature-selection algorithm , 2009 .

[7]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[8]  Michael Hebert,et al.  Text Structure Strategies for Improving Expository Reading Comprehension. , 2017 .

[9]  Roberta J. Herter,et al.  A Lesson Cycle for Teaching Expository Reading and Writing , 2010 .

[10]  Sreenivas Gollapudi,et al.  Study Navigator: An Algorithmically Generated Aid for Learning from Electronic Textbooks. , 2014, EDM 2014.

[11]  Shourya Roy,et al.  Fast and accurate text classification via multiple linear discriminant projections , 2003, The VLDB Journal.

[12]  Anthonius J.M. de Jong,et al.  Types and qualities of knowledge , 1993 .

[13]  Jaan Mikk,et al.  Textbook: Research and Writing , 2000 .

[14]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[15]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[16]  Jukka Hyönä,et al.  How prior knowledge, WMC, and relevance of information affect eye fixations in expository text. , 2003, Journal of experimental psychology. Learning, memory, and cognition.

[17]  Houkuan Huang,et al.  Feature selection for text classification with Naïve Bayes , 2009, Expert Syst. Appl..

[18]  Mandar Mitra,et al.  Information Retrieval from Documents: A Survey , 2000, Information Retrieval.