A new search method for ranking short text messages using semantic features and cluster coherence

A search results ranking method that uses semantic features and a cluster coherence measure is introduced in this paper. The quality of the returned search results is improved by grouping semantically related texts into clusters displayed in descending cluster size order. First the term-document matrix is constructed where the documents correspond to individual texts. Then, nonnegative matrix factorization (NMF) is used to group the texts into semantically related clusters. Only those clusters whose coherence is greater than a threshold value are displayed. In this way trending conceptually similar texts that re-occur in the input of multiple users are identified. The advantage of this approach compared to other methods [6] consists in the fact that the clusters in the approach introduced in this paper are computed by semantic similarity and not only by texts counters.

[1]  Sun Park,et al.  Query-Based Multi-Document Summarization Using Non-Negative Semantic Feature and NMF Clustering , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[2]  Efstratios Gallopoulos,et al.  TMG: A MATLAB Toolbox for Generating Term-Document Matrices from Text Collections , 2006, Grouping Multidimensional Data.

[3]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[4]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[5]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.