论文信息 - Leveraging Conceptualization for Short-Text Embedding

Leveraging Conceptualization for Short-Text Embedding

Most short-text embedding models typically represent each short-text only using the literal meanings of the words, which makes these models indiscriminative for the ubiquitous polysemy. In order to enhance the semantic representation capability of the short-texts, we (i) propose a novel short-text conceptualization algorithm to assign the associated concepts for each short-text, and then (ii) introduce the conceptualization results into learning the conceptual short-text embeddings. Hence, this semantic representation is more expressive than some widely-used text representation models such as the latent topic model. Wherein, the short-text conceptualization algorithm used here is based on a novel co-ranking framework, enabling the signals (i.e., the words and the concepts) to fully interplay to derive the solid conceptualization for the short-texts. Afterwards, we further extend the conceptual short-text embedding models by utilizing an attention-based model that selects the relevant words within the context to make more efficient prediction. The experiments on the real-world datasets demonstrate that the proposed conceptual short-text embedding model and short-text conceptualization algorithm are more effective than the state-of-the-art methods.

[1] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2] Yong Zhang,et al. Attention pooling-based convolutional neural network for sentence modelling , 2016, Inf. Sci..

[3] Haixun Wang,et al. Short Text Conceptualization Using a Probabilistic Knowledgebase , 2011, IJCAI.

[4] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5] Jon Kleinberg,et al. Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[6] Ramón Fernández Astudillo,et al. Not All Contexts Are Created Equal: Better Word Representations with Variable Attention , 2015, EMNLP.

[7] Kevin Gimpel,et al. Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[8] Bowen Zhou,et al. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[9] Evgeniy Gabrilovich,et al. Wikipedia-based Semantic Interpretation for Natural Language Processing , 2014, J. Artif. Intell. Res..

[10] Qiang Zhou,et al. CSE: Conceptual Sentence Embeddings based on Attention Model , 2016, ACL.

[11] Craig MacDonald,et al. Overview of the TREC-2012 Microblog Track , 2012, Text Retrieval Conference.

[12] Xiaofeng Meng,et al. Query Understanding through Knowledge-Based Conceptualization , 2015, IJCAI.

[13] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[15] Haixun Wang,et al. Short text understanding through lexical-semantic analysis , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[16] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[17] Haixun Wang,et al. Understanding Short Texts through Semantic Enrichment and Hashing , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18] Ying Ding,et al. Applying weighted PageRank to author citation networks , 2011, J. Assoc. Inf. Sci. Technol..

[19] Philip S. Yu,et al. A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[20] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[21] Alessandro Moschitti,et al. Learning Contextual Embeddings for Structural Semantic Similarity using Categorical Information , 2017, CoNLL.

[22] Tiago A. Almeida,et al. Short text opinion detection using ensemble of classifiers and semantic indexing , 2016, Expert Syst. Appl..

[23] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[24] Heyan Huang,et al. Query Expansion Based on a Feedback Concept Model for Microblog Retrieval , 2017, WWW.

[25] Dan Roth,et al. On Dataless Hierarchical Text Classification , 2014, AAAI.

[26] Xueqi Cheng,et al. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[27] Dan Roth,et al. Unsupervised Sparse Vector Densification for Short Text Similarity , 2015, NAACL.

[28] Heng Zhang,et al. Improving short text classification by learning vector representations of both words and hidden topics , 2016, Knowl. Based Syst..

[29] Xuanjing Huang,et al. Sentence Modeling with Gated Recursive Neural Network , 2015, EMNLP.

[30] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[31] Inderjit S. Dhillon,et al. Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[32] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[33] Rabab Kreidieh Ward,et al. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[34] Bowen Zhou,et al. Dependency-based Convolutional Neural Networks for Sentence Embedding , 2015, ACL.

[35] Haixun Wang,et al. Identifying users' topical tasks in web search , 2013, WSDM.

[36] Yasuhiro Fujiwara,et al. Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[37] Iadh Ounis,et al. Overview of the TREC 2011 Microblog Track , 2011, TREC.

[38] Jin Wang,et al. Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification , 2017, IJCAI.

[39] M. de Rijke,et al. Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[40] Jit Biswas,et al. Uncertainty handling in semantic reasoning for accurate context understanding , 2015, Knowl. Based Syst..

[41] Sreenivas Gollapudi,et al. Similarity Search using Concept Graphs , 2014, CIKM.

[42] Haixun Wang,et al. Open Domain Short Text Conceptualization: A Generative + Descriptive Modeling Approach , 2015, IJCAI.

[43] Seung-won Hwang,et al. Fine-Grained Semantic Conceptualization of FrameNet , 2016, AAAI.

[44] Haixun Wang,et al. Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[45] Qiang Zhou,et al. Conceptual Sentence Embeddings , 2016, WAIM.

[46] Zhiyuan Liu,et al. Topical Word Embeddings , 2015, AAAI.

[47] Zhoujun Li,et al. Concept-based Short Text Classification and Ranking , 2014, CIKM.

[48] Hongyuan Zha,et al. Co-ranking Authors and Documents in a Heterogeneous Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[49] Haixun Wang,et al. Understanding short texts through semantic enrichment and hashing , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[50] Ji-Rong Wen,et al. Contextual Text Understanding in Distributional Semantic Space , 2015, CIKM.

[51] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[52] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[53] Heyan Huang,et al. A Co-ranking Framework to Select Optimal Seed Set for Influence Maximization in Heterogeneous Network , 2015, APWeb.

[54] Zhiyuan Liu,et al. Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[55] Eu-Gene Siew,et al. Learning the heterogeneous bibliographic information network for literature-based discovery , 2017, Knowl. Based Syst..

[56] Dan Roth,et al. Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[57] Xindong Wu,et al. Computing term similarity by large probabilistic isA knowledge , 2013, CIKM.

[58] Alexander J. Smola,et al. Neural Machine Translation with Recurrent Attention Modeling , 2016, EACL.

[59] W. Bruce Croft,et al. aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model , 2016, CIKM.