论文信息 - Study on text representation method based on deep learning and topic information

Study on text representation method based on deep learning and topic information

Deep learning provides a new modeling method for natural language processing. In recent years, it has been applied in language model, text classification, machine translation, sentiment analysis, question and answer system, word distributed representation, etc., and a series of theoretical research results have been obtained. For the text representation task, this paper studies the strategy of fusing global and local context information, and proposes a word representation model called Topic-based CBOW that integrates deep neural network, topic information and word order information. Then, based on the word distributed representation obtained by Topic-based CBOW, a short text representation method with TF–IWF-weighted pooling is proposed. Finally, the performance of the Topic-based CBOW model and the short text representation are compared with the baseline models, and it is found that the proposed method improves the quality of the word distributed representation to some extent by introducing the topic vector and retaining word order information, and text representation also performs well in text classification tasks.

Zilong Jiang | Shu Gao | Liangchen Chen

[1] Sandeep Yadav,et al. Restricted Boltzmann machine and softmax regression for fault detection and classification , 2017, Complex & Intelligent Systems.

[2] G. Frege. Über Sinn und Bedeutung , 1892 .

[3] Jun Wang,et al. Learning text representation using recurrent convolutional neural network with highway layers , 2016, SIGIR 2016.

[4] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[5] Laurens van der Maaten,et al. Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[6] Zellig S. Harris,et al. Distributional Structure , 1954 .

[7] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[8] Christopher E. Moody,et al. Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec , 2016, ArXiv.

[9] Karl Moritz Hermann,et al. Distributed representations for compositional semantics , 2014, ArXiv.

[10] Hongwei Liu,et al. SAR Target Discrimination Based on BOW Model With Sample-Reweighted Category-Specific and Shared Dictionary Learning , 2017, IEEE Geoscience and Remote Sensing Letters.

[11] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.