Paragraph Vector Representation Based on Word to Vector and CNN Learning
暂无分享,去创建一个
Document processing in natural language includes retrieval, sentiment analysis, theme extraction, etc. Classical methods for handling these tasks are based on models of probability, semantics and networks for machine learning. The probability model is \emph{loss of semantic information} in essential, and it influences the processing accuracy. Machine learning approaches include supervised, unsupervised, and semi-supervised approaches, labeled corpora is necessary for semantics model and supervised learning. The method for achieving a reliably labeled corpus is done manually, it is \emph{costly and time-consuming} because people have to read each document and annotate the label of each document. Recently, the continuous CBOW model is efficient for learning high-quality distributed vector representations, and it can capture a large number of precise syntactic and semantic word relationships, this model can be easily extended to learn paragraph, but it \emph{is not precise}. Towards these problems, this paper is devote to develop a new model for learning paragraph vector, we combine the CBOW model and CNNs to establish a new deep learning model. Experimental results show that the new model is better than the CBOW model in semantic relativeness and accuracy in paragraph vector space.