论文信息 - The Text Classification Based on Big Data Analysis for Keyword Definition Using Stemming

The Text Classification Based on Big Data Analysis for Keyword Definition Using Stemming

Software for steaming Ukrainian-language texts has been developed and implemented, and methods for classifying texts written in Ukrainian using the Porter algorithm. The software product is made in the Python programming language, using the NLTK library. An analysis of existing methods such as classification, clustering and others was performed. Methods of vectorisation of text data and patterns of keeping the dictionary have been considered. Moreover, information about previously analysed data has been saved.