A embedding model for text classification

Existing word embeddings learning algorithms only employ the contexts of words, but different text documents use words and their relevant parts of speech very differently. Based on the preceding assumption, in order to obtain appropriate word embeddings and further improve the effect of text classification, this paper studies in depth a representation of words combined with their parts of speech. First, using the parts of speech and context of words, a more expressive word embeddings can be obtained. Further, to improve the efficiency of look‐up tables, we construct a two‐dimensional table that is in the format to represent words in text documents. Finally, the two‐dimensional table and a Bayesian theorem are used for text classification. Experimental results show that our model has achieved more desirable results on standard data sets. And it has more preferable versatility and portability than alternative models.

[1]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[2]  Bing Qin,et al.  Sentiment Analysis: Sentiment Analysis , 2010 .

[3]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Junsong Yuan,et al.  Abnormal event detection in crowded scenes using sparse representation , 2013, Pattern Recognit..

[6]  Enhong Chen,et al.  A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[7]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[8]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[9]  Michalis Vazirgiannis,et al.  Text Categorization as a Graph Classification Problem , 2015, ACL.

[10]  Jianping Fan,et al.  Automatic image-text alignment for large-scale web image indexing and retrieval , 2015, Pattern Recognit..

[11]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[12]  Moongu Jeon,et al.  Terms-based discriminative information space for robust text classification , 2016, Inf. Sci..

[13]  Peng Jin,et al.  Bag-of-Embeddings for Text Classification , 2016, IJCAI.

[14]  Heng Zhang,et al.  Improving short text classification by learning vector representations of both words and hidden topics , 2016, Knowl. Based Syst..

[15]  Murat Can Ganiz,et al.  A new hybrid semi-supervised algorithm for text classification with class-based semantics , 2016, Knowl. Based Syst..

[16]  Siddhartha R. Jonnalagadda,et al.  PDF text classification to leverage information extraction from publication reports , 2016, J. Biomed. Informatics.

[17]  Xindong Wu,et al.  Efficient sequential pattern mining with wildcards for keyphrase extraction , 2017, Knowl. Based Syst..

[18]  Zongda Wu,et al.  An efficient Wikipedia semantic matching approach to text document classification , 2017, Inf. Sci..