Intellectualization of Knowledge Acquisition of Academic Texts as an Answer to Challenges of Modern Information Society

Extracting knowledge from an increasing information flow is one of the main challenges of modern information society. The paper considers the possibilities and means for intellectualization of this process concerning such an important information source as the academic texts. In this case the user is faced with the task of finding fragments relevant to the subject of interest, within the vast textual documents often written in a foreign language. We experimentally investigated the comparative effectiveness of TS algorithms for extended coherent academic texts. The procedure of instrumental effectiveness evaluation was substantiated. The influence of the most significant characteristics of the text, including original language, structural organization (levels of heading), subjects of research (technique, information technologies and medicine) was considered. We have shown that for the intellectualization of knowledge acquisition from academic texts it is necessary to present to the reader the results of the TS fulfilled by different algorithms, in a complex. A system of complex visualization of TS results is proposed, and an appropriate software solution is developed. The visualization system for extended coherent texts explicitly demonstrates the semantic structure of the text, which allows the user to detect and analyze not the whole text, but only fragments corresponding to his current information needs and thus getting a complete idea of the subject of interest.

[1]  Maite Taboada,et al.  Subtopic Annotation in a Corpus of News Texts: Steps Towards Automatic Subtopic Segmentation , 2013, STIL.

[2]  Chris Biemann,et al.  Text Segmentation with Topic Models , 2012, Journal for Language Technology and Computational Linguistics.

[3]  Phoey Lee Teh,et al.  Text Segmentation Techniques: A Critical Review , 2018 .

[4]  Yaakov Yaari,et al.  Segmentation of Expository Texts by Hierarchical Agglomerative Clustering , 1997, ArXiv.

[5]  Natalia V. Dobrenko,et al.  Исследование специфики применения алгоритмов тематической сегментации для научных текстов (Specifics of Applying Topic Segmentation Algorithms to Scientific Texts) , 2015, DAMDID/RCDL.

[6]  Douglas Biber,et al.  Representativeness in corpus design , 1993 .

[7]  Klaus Ries Segmenting Conversations by Topic, Initiative, and Style , 2001, SIGIR Workshop: Information Retrieval Techniques for Speech Applications.

[8]  Johanna D. Moore,et al.  Latent Semantic Analysis for Text Segmentation , 2001, EMNLP.

[9]  Fei Song,et al.  An Iterative Approach to Text Segmentation , 2011, ECIR.

[10]  Joy Burrough-Boenisch Culture and conventions: writing and reading Dutch scientific English , 2002 .

[11]  Monica Randaccio Language change in scientific discourse , 2004 .

[12]  W. Kintsch,et al.  Strategies of discourse comprehension , 1986 .

[13]  Nicholas Ostler,et al.  Corpus Design Criteria , 1992 .

[14]  Greg Myers,et al.  Lexical cohesion and specialized knowledge in science and popular science texts , 1991 .

[15]  Konstantin Vorontsov,et al.  Additive regularization of topic models , 2015, Machine Learning.

[16]  Michael Halliday,et al.  Cohesion in English , 1976 .

[17]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[18]  Natalia V. Dobrenko,et al.  Subtopic Segmentation of Scientific Texts: Parameter Optimisation , 2015, KESW.