Semantic Enriched Short Text Clustering

The paper is devoted to the issue of clustering short texts, which are free answers gathered during brain storming seminars. Those answers are short, often incomplete, and highly biased toward the question, so establishing a notion of proximity between texts is a challenging task. In addition, the number of answers is counted up to hundred instances, which causes sparsity. We present three text clustering methods in order to choose the best one for this specific task, then we show how the method can be improved by a semantic enrichment, including neural-based distributional models and external knowledge resources. The algorithms have been evaluated on the unique seminar’s data sets.

[1]  Marek Kozlowski,et al.  SnS: A Novel Word Sense Induction Method , 2014, RSEISP.

[2]  Roberto Navigli,et al.  Inducing Word Senses to Improve Web Search Result Clustering , 2010, EMNLP.

[3]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[4]  David R. Karger,et al.  Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections , 2017, SIGF.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[7]  Dawid Weiss,et al.  Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition , 2004, Intelligent Information Systems.

[8]  Tiziano Flati,et al.  Three Birds (in the LLOD Cloud) with One Stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy , 2014, SEMANTICS.

[9]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[10]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Marek Kozlowski,et al.  Word Sense Induction with Closed Frequent Termsets , 2017, Comput. Intell..

[13]  Roberto Navigli,et al.  Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction , 2013, CL.

[14]  Roberto Navigli,et al.  Clustering Web Search Results with Maximum Spanning Trees , 2011, AI*IA.

[15]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[16]  Roberto Navigli (Digital) Goodies from the ERC Wishing Well: BabelNet, Babelfy, Video Games with a Purpose and the Wikipedia Bitaxonomy , 2014, CogALex@COLING.

[17]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[18]  Dawid Weiss,et al.  A concept-driven algorithm for clustering search results , 2005, IEEE Intelligent Systems.