Improving tweet clustering using bigrams formed from word associations

In this work we propose an innovative clustering algorithm for twitter data. In the the context of e-commerce, we use Apiori algorithm to form 2-gram association rules and cluster tweets using self organizing maps. Since tweets are relatively small, word association becomes all the more important in mining the information. To check if 2-grams formed using word associations, help in increasing clustering tendency we use Hopkins index. Tested on two separate datasets, of 200 and 10,000 tweets each related to the key word "Amazon", our results of the analysis show that there is improvement in the clustering tendency in both the datasets. This improvement in clustering tendency is potentially useful because customer grouping based on the tweets can help businesses determine new trends and identify customers with different sentiments.

[1]  J. G. Skellam,et al.  A New Method for determining the Type of Distribution of Plant Individuals , 1954 .

[2]  Tiejun Zhao,et al.  Chinese Semantic Role Labeling Based on Feature Combination: Chinese Semantic Role Labeling Based on Feature Combination , 2011 .

[3]  Anatole Gershman,et al.  Topical Clustering of Tweets , 2011 .

[4]  Li Shi Chinese Semantic Role Labeling Based on Feature Combination , 2011 .

[5]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[6]  F. Mörchen,et al.  ESOM-Maps : tools for clustering , visualization , and classification with Emergent SOM , 2005 .

[7]  Shubhamoy Dey,et al.  Impact of News Articles on Stock Prices: An Analysis using Machine Learning , 2014, I-CARE 2014.

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Ming Zhou,et al.  Collective Semantic Role Labeling for Tweets with Clustering , 2011, IJCAI.

[10]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[11]  Evolution of e-commerce in India Creating the bricks behind the clicks , 2014 .

[12]  Shubhamoy Dey,et al.  Using Self-Organizing Maps for Sentiment Analysis , 2013, ArXiv.

[13]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[14]  Marc Cheong,et al.  A Study on Detecting Patterns in Twitter Intra-topic User and Message Clustering , 2010, 2010 20th International Conference on Pattern Recognition.

[15]  Yeuvo Jphonen,et al.  Self-Organizing Maps , 1995 .

[16]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[17]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.