Classification of Turkish News Content by Deep Learning Based LSTM Using Fasttext Model

With the increase in the rate of use of the Internet, the number of content produced has increased. Texture classification allows these categorized content to be automatically categorized. In this study, a special type of repetitive artificial neural networks(RNN) using the deep learning based Fasttext model, LSTM (Long-Short Term Memory) was used to classify the news texts. Fasttext, Word2vec and Doc2vec models are used to classify data on the data set and the success rates are compared. LSTM is used to classify the news data on the Fasttext model which gives the most successful result.

[1]  Wang Ling,et al.  Generative and Discriminative Text Classification with Recurrent Neural Networks , 2017, ArXiv.

[2]  Jérôme Louradour,et al.  Segmentation-free handwritten Chinese text recognition with LSTM-RNN , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[3]  Fahim Mohammad,et al.  Is preprocessing of text really worth your time for online comment classification? , 2018, ArXiv.

[5]  Mehmet Kaya,et al.  A Clickbait Detection Method on News Sites , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[6]  Gurkan Sahin,et al.  Turkish document classification based on Word2Vec and SVM classifier , 2017, 2017 25th Signal Processing and Communications Applications Conference (SIU).

[7]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[8]  Mehmet Kaya,et al.  Extracting abstract and keywords from context for academic articles , 2018, Social Network Analysis and Mining.

[9]  Mahmoud Al-Ayyoub,et al.  Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews , 2018, International Journal of Machine Learning and Cybernetics.

[10]  Zhengyang Wang,et al.  Learning Convolutional Text Representations for Visual Question Answering , 2017, SDM.

[11]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[12]  Onur Karasoy,et al.  Classification Turkish SMS with deep learning tool Word2Vec , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[13]  Fang Miao,et al.  Chinese News Text Classification Based on Machine Learning Algorithm , 2018, 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC).

[14]  Changliang Li,et al.  Compositional Recurrent Neural Networks for Chinese Short Text Classification , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[15]  Emre DOĞAN,et al.  Generation of Original Text with Text Mining and Deep Learning Methods for Turkish and Other Languages , 2018, 2018 International Conference on Artificial Intelligence and Data Processing (IDAP).