Tag Prediction in Social Annotation Systems Based on CNN and BiLSTM

Social annotation systems enable users to annotate large-scale texts with tags which provide a convenient way to discover, share and organize rich information. However, manually annotating massive texts is in general costly in manpower. Therefore, automatic annotation by tag prediction is of great help to improve the efficiency of semantic identification of social contents. In this paper, we propose a tag prediction model based on convolutional neural networks (CNN) and bi-directional long short term memory (BiLSTM) network, through which, tags of texts can be predicted efficiently and accurately. By Experiments on real-world datasets from a social Q&A community, the results show that the proposed CNN-BiLSTM model achieves state-of-the-art accuracy for tag prediction.

[1]  Christophe Garcia,et al.  Deep learning and recurrent connectionist-based approaches for Arabic text recognition in videos , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[2]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[3]  Bernhard Schölkopf,et al.  Learning from Labeled and Unlabeled Data Using Random Walks , 2004, DAGM-Symposium.

[4]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[5]  K. Duraiswamy,et al.  An Approach for Text Summarization using Deep Learning Algorithm , 2014, J. Comput. Sci..

[6]  Liang Lin,et al.  Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[7]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[8]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[9]  Liangjun Chen,et al.  Generalized correntropy induced loss function for deep learning , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[10]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[12]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[13]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[14]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[16]  Jesse Read,et al.  A Pruned Problem Transformation Method for Multi-label Classification , 2008 .

[17]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[18]  Bernard Widrow,et al.  A comparison of adaptive algorithms based on the methods of steepest descent and random search , 1976 .

[19]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.