Background Knowledge Based Multi-Stream Neural Network for Text Classification

As a foundation and typical task in natural language processing, text classification has been widely applied in many fields. However, as the basis of text classification, most existing corpus are imbalanced and often result in the classifier tending its performance to those categories with more texts. In this paper, we propose a background knowledge based multi-stream neural network to make up for the imbalance or insufficient information caused by the limitations of training corpus. The multi-stream network mainly consists of the basal stream, which retained original sequence information, and background knowledge based streams. Background knowledge is composed of keywords and co-occurred words which are extracted from external corpus. Background knowledge based streams are devoted to realizing supplemental information and reinforce basal stream. To better fuse the features extracted from different streams, early-fusion and two after-fusion strategies are employed. According to the results obtained from both Chinese corpus and English corpus, it is demonstrated that the proposed background knowledge based multi-stream neural network performs well in classification tasks.

[1]  Juan Ramón Rico-Juan,et al.  Oversampling imbalanced data in the string space , 2018, Pattern Recognit. Lett..

[2]  Yang Fang,et al.  GTrans: Generic Knowledge Graph Embedding via Multi-State Entities and Dynamic Relation Spaces , 2018, IEEE Access.

[3]  Zhen Zhang,et al.  Advanced Sentiment Classification of Tibetan Microblogs on Smart Campuses Based on Multi-Feature Fusion , 2018, IEEE Access.

[4]  Jianming Zheng,et al.  Self-Interaction Attention Mechanism-Based Text Representation for Document Classification , 2018 .

[5]  Poorva Agrawal,et al.  A survey on text document categorization using enhanced sentence vector space model and bi-gram text representation model based on novel fusion techniques , 2018, 2018 2nd International Conference on Inventive Systems and Control (ICISC).

[6]  Xiaochun Luo,et al.  Towards efficient and objective work sampling: Recognizing workers' activities in site surveillance videos with two-stream convolutional networks , 2018, Automation in Construction.

[7]  Erik Cambria,et al.  Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM , 2018, AAAI.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[12]  Francisco Herrera,et al.  Sentiment Analysis in TripAdvisor , 2017, IEEE Intelligent Systems.

[13]  Xiaoping Du,et al.  Current Market Top Business Scopes Trend—A Concurrent Text and Time Series Active Learning Study of NASDAQ and NYSE Stocks from 2012 to 2017 , 2018 .

[14]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Songbo Tan,et al.  Neighbor-weighted K-nearest neighbor for unbalanced text corpus , 2005, Expert Syst. Appl..

[16]  Fuji Ren Member,et al.  Hybrid Chinese text classification approach using general knowledge from Baidu Baike , 2016 .

[17]  Weilin Huang,et al.  Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[18]  Fuji Ren,et al.  Class-indexing-based term weighting for automatic text classification , 2013, Inf. Sci..

[19]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[20]  Zohra Bellahsene,et al.  Building an effective and efficient background knowledge resource to enhance ontology matching , 2018, J. Web Semant..

[21]  Yijing Li,et al.  Imbalanced text sentiment classification using universal and domain-specific knowledge , 2018, Knowl. Based Syst..

[22]  Yi Zhu,et al.  Hidden Two-Stream Convolutional Networks for Action Recognition , 2017, ACCV.

[23]  Xuanjing Huang,et al.  Deep Fusion LSTMs for Text Semantic Matching , 2016, ACL.

[24]  Aïcha Mokhtari,et al.  Combining supervised term-weighting metrics for SVM text classification with extended term representation , 2016, Knowledge and Information Systems.

[25]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[26]  Kezhi Mao,et al.  Topic-Aware Deep Compositional Models for Sentence Classification , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Chao Li,et al.  Hybrid Chinese text classification approach using general knowledge from Baidu Baike , 2016 .

[28]  Li Li,et al.  Combining Lexical and Semantic Features for Short Text Classification , 2013, KES.

[29]  José Rodríguez,et al.  An Attention Mechanism for Neural Answer Selection Using a Combined Global and Local View , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[30]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[31]  Xin Li,et al.  Transformation Networks for Target-Oriented Sentiment Classification , 2018, ACL.

[32]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[33]  Sudeep D. Thepade,et al.  Microblogging Comments Classification , 2017 .

[34]  Xindong Wu,et al.  Online feature selection for high-dimensional class-imbalanced data , 2017, Knowl. Based Syst..

[35]  Shasha Wang,et al.  Deep feature weighting for naive Bayes and its application to text classification , 2016, Eng. Appl. Artif. Intell..

[36]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[37]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[38]  Tomoaki Ohtsuki,et al.  A Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter , 2017, IEEE Access.

[39]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[40]  Jordán Pascual Espada,et al.  Machine learning approach for text and document mining , 2014, ArXiv.

[41]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[42]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[43]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[44]  Minyi Guo,et al.  DKN: Deep Knowledge-Aware Network for News Recommendation , 2018, WWW.

[45]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[46]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[47]  Chao Li Text Classification Based on Background Knowledge , 2017 .

[48]  Tianlong Gu,et al.  Knowledge Graph Embedding by Dynamic Translation , 2017, IEEE Access.

[49]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.