Towards effective discovery of natural communities in complex networks and implications in e-commerce

Automated community detection is an important problem in the study of complex networks. The idea of community detection is closely related to the concept of data clustering in pattern recognition. Data clustering refers to the task of grouping similar objects and segregating dissimilar objects. The community detection problem can be thought of as finding groups of densely interconnected nodes with few connections to nodes outside the group. A node similarity measure is proposed here that finds the similarity between two nodes by considering both neighbors and non-neighbors of these two nodes. Subsequently, a method is introduced for identifying communities in complex networks using this node similarity measure and the notion of data clustering. The significant characteristic of the proposed method is that it does not need any prior knowledge about the actual communities of a network. Extensive experiments on several real world and artificial networks with known ground-truth communities are reported. The proposed method is compared with various state of the art community detection algorithms by using several criteria, viz. normalized mutual information, f-measure etc. Moreover, it has been successfully applied in improving the effectiveness of a recommender system which is rapidly becoming a crucial tool in e-commerce applications. The empirical results suggest that the proposed technique has the potential to improve the performance of a recommender system and hence it may be useful for other e-commerce applications.

[1]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  C. A. Murthy,et al.  A similarity assessment technique for effective grouping of documents , 2015, Inf. Sci..

[3]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Dong Wang,et al.  Sentiment community detection: exploring sentiments and relationships in social networks , 2016, Electronic Commerce Research.

[7]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[8]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[10]  Asit Kumar Das,et al.  Finding patterns in the degree distribution of real-world complex networks: going beyond power law , 2019, Pattern Analysis and Applications.

[11]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[12]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[13]  Andreas Geyer-Schulz,et al.  An ensemble learning strategy for graph clustering , 2012, Graph Partitioning and Graph Clustering.

[14]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[15]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[16]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Steve Harenberg,et al.  Community detection in large‐scale networks: a survey and empirical evaluation , 2014 .

[18]  Ambuj K. Singh,et al.  Scalable discovery of best clusters on large graphs , 2010, Proc. VLDB Endow..

[19]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[20]  Milan Holický Testing of Statistical Hypotheses , 2013 .

[21]  C. A. Murthy,et al.  A similarity based generalized modularity measure towards effective community discovery in complex networks , 2019, Physica A: Statistical Mechanics and its Applications.

[22]  Michalis Vazirgiannis,et al.  Clustering and Community Detection in Directed Networks: A Survey , 2013, ArXiv.

[23]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[24]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[25]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[26]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[27]  Roung-Shiunn Wu,et al.  Customer segmentation of multiple category data in e-commerce using a soft-clustering approach , 2011, Electron. Commer. Res. Appl..

[28]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[29]  Pasquale De Meo,et al.  Mixing local and global information for community detection in large networks , 2013, J. Comput. Syst. Sci..

[30]  Kim-Kwang Raymond Choo,et al.  A model for sentiment and emotion analysis of unstructured social media text , 2018, Electron. Commer. Res..

[31]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Calyampudi Radhakrishna Rao,et al.  Formulae and tables for statistical work , 1968 .

[33]  Zhao Yang,et al.  A Comparative Analysis of Community Detection Algorithms on Artificial Networks , 2016, Scientific Reports.

[34]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[35]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[36]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[37]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[39]  C. A. Murthy,et al.  CUES: A New Hierarchical Approach for Document Clustering , 2013 .

[40]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[41]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..