Context-Dependent Conceptualization

Conceptualization seeks to map a short text (i.e., a word or a phrase) to a set of concepts as a mechanism of understanding text. Most of prior research in conceptualization uses human-crafted knowledge bases that map instances to concepts. Such approaches to conceptualization have the limitation that the mappings are not context sensitive. To overcome this limitation, we propose a framework in which we harness the power of a probabilistic topic model which inherently captures the semantic relations between words. By combining latent Dirichlet allocation, a widely used topic model with Probase, a large-scale probabilistic knowledge base, we develop a corpus-based framework for context-dependent conceptualization. Through this simple but powerful framework, we improve conceptualization and enable a wide range of applications that rely on semantic understanding of short texts, including frame element prediction, word similarity in context, ad-query similarity, and query similarity.

[1]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[2]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[3]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[4]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[5]  Haixun Wang,et al.  Short Text Conceptualization Using a Probabilistic Knowledgebase , 2011, IJCAI.

[6]  Padhraic Smyth,et al.  Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning , 2008, SEMWEB.

[7]  Xiaojin Zhu,et al.  A Topic Model for Word Sense Disambiguation , 2007, EMNLP.

[8]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[9]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[10]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[11]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Haixun Wang,et al.  Identifying users' topical tasks in web search , 2013, WSDM.

[14]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Haixun Wang,et al.  Semantic Multidimensional Scaling for Open-Domain Sentiment Analysis , 2014, IEEE Intelligent Systems.

[16]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[17]  Mirella Lapata,et al.  Semi-Supervised Semantic Role Labeling , 2009, EACL.