Cluster of tweet users based on optimal set

Over the years or even decades, researchers are dealing with the problem of duplicate clusters or overlapping clusters in a cluster set. Clusters overlap within each other just as in the case of social networking groups, or grouping movies by genre. In this paper, hierarchical form of clustering is used to cluster user based on interaction which creates numerous clusters with different sizes at different hierarchical level. In doing so, many overlapping clusters are generated but duplicates are not removed. Duplicity possesses a challenge for differentiation. Our work here is two fold. Firstly, to cluster users with different hierarchical levels to generate sets of clusters by level and secondly, to find among the different cluster sets the optimal one by simply using mean and standard deviation. The sense of optimality is different for different requirements. Our work shows that we can have a choice of picking the optimal set by requirement.

[1]  Robert L. Grossman,et al.  Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining , 2005, KDD 2005.

[2]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Felix Naumann,et al.  Estimating the Number and Sizes of Fuzzy-Duplicate Clusters , 2014, CIKM.

[4]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[5]  Joydeep Ghosh,et al.  Model-based overlapping clustering , 2005, KDD '05.

[6]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[7]  C. Hennig,et al.  Dissolution point and isolation robustness: Robustness criteria for general cluster analysis methods , 2008 .

[8]  Renée J. Miller,et al.  Framework for Evaluating Clustering Algorithms in Duplicate Detection , 2009, Proc. VLDB Endow..

[9]  Steve Gregory,et al.  A Fast Algorithm to Find Overlapping Communities in Networks , 2008, ECML/PKDD.

[10]  Christian Hennig,et al.  Cluster-wise assessment of cluster stability , 2007, Comput. Stat. Data Anal..

[11]  T. Nepusz,et al.  Fuzzy communities and the concept of bridgeness in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Huan Liu,et al.  Discovering Overlapping Groups in Social Media , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Malik Magdon-Ismail,et al.  Measuring Similarity between Sets of Overlapping Clusters , 2010, 2010 IEEE Second International Conference on Social Computing.

[14]  I. Guyon,et al.  Detecting stable clusters using principal component analysis. , 2003, Methods in molecular biology.