Distributional Semantic Phrase Clustering and Conceptualization Using Probabilistic Knowledgebase

Distributional Semantics is an active research area in natural language processing (NLP) that develop methods for quantifying semantic similarities between linguistic elements in large samples of data. Short text conceptualization on the other hand is a technique for enriching short texts so that it become more interpretable. This is needed because most text mining tasks including topic modeling and clustering are based on statistical methods and won’t consider the semantics of text. This paper proposes a novel framework for combining distributional semantics and short text conceptualization for better interpretability of phrases in text data. Experiments on real-world datasets show that this method can better enrich phrases that are represented in distributional semantic spaces.