Information resources processing using linguistic analysis of textual content

The tremendous growth rates of Internet and volume of content stored on its servers requires the creation of tools for automatic content analysis and processing. Intelligent systems for content processing aim to simplify and automate such tasks as content analysis and classification, finding keywords and building digests, distributing content according to specified criteria. This paper presents the results of a study of patterns, characteristics and dependencies in automatic text processing of commercial content. Data and information flows in processes of commercial content transformation are elucidated and formalized. The methods of linguistic analysis are proposed for automation of all operations of content processing. A content management system built using developed methods is constantly monitoring content from various sources, gathers and integrates content and distributes it to customers. The implementation of proposed methods and procedures allows effectively create and distribute content for targeted social audience and individual customers.

[1]  Vasyl Lytvyn,et al.  Classification Methods of Text Documents Using Ontology Based Approach , 2017 .

[2]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[3]  Tetyana Shatovska,et al.  Intelligent Recruitment Services System , 2009, UNISCON.

[4]  Helvi Kyngäs,et al.  The qualitative content analysis process. , 2008, Journal of advanced nursing.

[5]  Hsiu-Fang Hsieh,et al.  Three Approaches to Qualitative Content Analysis , 2005, Qualitative health research.

[6]  Vasyl Lytvyn,et al.  Smart Data Integration by Goal Driven Ontology Learning , 2016, INNS Conference on Big Data.

[7]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[8]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[9]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[10]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[11]  Ilya Segalovich,et al.  A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine , 2003, MLMTA.

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[14]  Oleg V. Bisikalo,et al.  IDENTIFYING KEYWORDS ON THE BASIS OF CONTENT MONITORING METHOD IN UKRAINIAN TEXTS , 2016 .

[15]  Satoshi Sato,et al.  Automatic Collection of Related Terms from the Web , 2003, ACL.

[16]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[17]  Erin Smith Crabb,et al.  Using Structural Topic Modeling to Detect Events and Cluster Twitter Users in the Ukrainian Crisis , 2015, HCI.

[18]  Di Cai,et al.  Sentiment Analysis of Polish Texts , 2012 .

[19]  Nahid Shahmehri,et al.  Are You Busy, Cool, or just Curious? —CAFE: A Model with Three Different States of Mind for a User to Manage Information in Electronic Mail , 1998 .

[20]  Volodymyr Tarasenko,et al.  A CREATION OF THE LINGUISTIC ONTOLOGY BASED ON ASTRUCTURED ELECTRONIC ENCYCLOPEDIC RESOURCE , 2014 .