An Adaptive Sentence Representation Learning Model Based on Multi-gram CNN

Nature Language Processing has been paid more attention recently. Traditional approaches for language model primarily rely on elaborately designed features and complicated natural language processing tools, which take a large amount of human effort and are prone to error propagation and data sparse problem. Deep neural network method has been shown to be able to learn implicit semantics of text without extra knowledge. To better learn deep underlying semantics of sentences, most deepneuralnetworklanguagemodelsutilizemulti-gramstrategy. However, the current multi-gram strategies in CNN framework are mostly realized by concatenating trained multi-gram vectors to form the sentence vector, which can increase the number of parameters to be learned and is prone to over fitting. To alleviate the problem mentioned above, we propose a novel adaptive sentence representation learning model based on multigram CNN framework. It learns adaptive importance weights of different n-gram features and forms sentence representation by using weighted sum operation on extracted n-gram features, which can largely reduce parameters to be learned and alleviate the threat of over fitting. Experimental results show that the proposed method can improve performances when be used in sentiment and relation classification tasks.

[1]  Sanda M. Harabagiu,et al.  UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources , 2010, *SEMEVAL.

[2]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[3]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[4]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[5]  Jun Zhao,et al.  Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks , 2015, ACL.

[6]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[7]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[8]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[12]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[13]  Ming Yang,et al.  Bidirectional Long Short-Term Memory Networks for Relation Classification , 2015, PACLIC.

[14]  Xuanjing Huang,et al.  Convolutional Neural Tensor Network Architecture for Community-Based Question Answering , 2015, IJCAI.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[17]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.