Supervised author recognition with aggregated word embeddings

The number of texts has been remarkably increased with each passing day due to the rapid development of technology. This situation creates a need for the development of new techniques in the fields of text mining and natural language processing. Highly successful methods are developed by especially using word embedding based on artificial neural network. In this paper, an application is produced by using Word2vFisher based on word embedding and Fisher vector for the analysis of Turkish texts. A dataset containing 237 different columnist are created by collecting columns of last 20 years from the electronic archive of Hurriyet and Sabah newspapers. One of the important points of this study is that the experiments are conducted on the largest-ever dataset that contains Turkish newspaper columns. The effectiveness of the method on analysis of the Turkish texts is another important point of this study. It is believed that the method can be utilized in many other domains.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  K.R. Aida-zade,et al.  Authorship identification of the Azerbaijani texts using n-grams , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[3]  Banu Diri,et al.  Automatic Author Detection for Turkish Texts , 2003 .

[4]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[5]  Ersin Esen,et al.  Analysis of Turkish parliament records in terms of party coherence , 2017, 2017 25th Signal Processing and Communications Applications Conference (SIU).

[6]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[7]  Yun Zhu,et al.  Support vector machines and Word2vec for text classification with semantic features , 2015, 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).