Relative Validity Criteria for Community Mining Algorithms

Grouping data points is one of the fundamental tasks in data mining, which is commonly known as clustering if data points are described by attributes. When dealing with interrelated data that does not have any attributes and is represented in the form of nodes and their relationships, this task is also referred to as community mining. There has been a considerable number of approaches proposed in recent years for mining communities in a given network. But little work has been done on how to evaluate community mining results. The common practice is to use an agreement measure to compare the mining result against a ground truth, however, the ground truth is not known in most of the real world applications. In this paper, we investigate relative clustering quality measures defined for evaluation of clustering data points with attributes and propose proper adaptations to make them applicable in the context of social networks. Not only these relative criteria could be used as metrics for evaluating quality of the groupings but also they could be used as objectives for designing new community mining algorithms.

[1]  Przemyslaw Kazienko,et al.  Key Person Analysis in Social Communities within the Blogosphere , 2012, J. Univers. Comput. Sci..

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ward S. Mason,et al.  Explorations in role analysis : studies of the school superintendency role , 1958 .

[4]  Martin J. Dürst,et al.  Internationalized Resource Identifiers (IRIs) , 2005, RFC.

[5]  Saul Vargas,et al.  Rank and relevance in novelty and diversity metrics for recommender systems , 2011, RecSys '11.

[6]  Thorsten Joachims,et al.  Identifying Temporal Patterns and Key Players in Document Collections , 1995 .

[7]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[8]  J. Goldenberg,et al.  The Role of Hubs in the Adoption Process , 2009 .

[9]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[11]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[12]  Michael Kifer,et al.  Logical foundations of object-oriented and frame-based languages , 1995, JACM.

[13]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[14]  Linton C. Freeman,et al.  Uncovering Organizational Hierarchies , 1997, Comput. Math. Organ. Theory.

[15]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[16]  Vladimir Batagelj,et al.  Exploratory Social Network Analysis with Pajek , 2005 .

[17]  J. A. Barnes Class and Committees in a Norwegian Island Parish , 1954 .

[18]  Barry Smyth,et al.  Recommendation to Groups , 2007, The Adaptive Web.

[19]  Pang-Ning Tan,et al.  Exploration of Link Structure and Community-Based Node Roles in Network Analysis , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[20]  David L. Hicks,et al.  Detecting Hidden Hierarchy in Terrorist Networks: Some Case Studies , 2008, ISI Workshops.

[21]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[22]  Barry Smyth,et al.  Case-based recommender systems , 2005, The Knowledge Engineering Review.

[23]  Wei Chu,et al.  Stochastic Relational Models for Discriminative Link Prediction , 2006, NIPS.

[24]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[25]  Benno Stein,et al.  A Wikipedia-Based Multilingual Retrieval Model , 2008, ECIR.

[26]  Cédric Bernier,et al.  Analysis of Strategies for Building Group Profiles , 2010, UMAP.

[27]  Caroline Haythornthwaite,et al.  Studying Online Social Networks , 2006, J. Comput. Mediat. Commun..

[28]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[29]  Vítor Santos Costa,et al.  Inductive Logic Programming , 2013, Lecture Notes in Computer Science.

[30]  Yehuda Koren,et al.  Adaptive bootstrapping of recommender systems using decision trees , 2011, WSDM '11.

[31]  Andrew W. Moore,et al.  Scalable graphical models for social networks , 2007 .

[32]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[33]  A RossKenneth,et al.  The well-founded semantics for general logic programs , 1991 .

[34]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[35]  not Cwi,et al.  CURIE Syntax 1.0, A syntax for expressing Compact URIs , 2007 .

[36]  Fernanda B. Viégas,et al.  Newsgroup Crowds and AuthorLines: visualizing the activity of individuals in conversational cyberspaces , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[37]  Osmar R. Zaïane,et al.  A Diffusion of Innovation-Based Closeness Measure for Network Associations , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[38]  Yiming Yang,et al.  Personalized active learning for collaborative filtering , 2008, SIGIR '08.

[39]  Michael Kifer,et al.  Rule Interchange Format: The Framework , 2008, RuleML.

[40]  Randy Goebel,et al.  Detecting Communities in Large Networks by Iterative Local Expansion , 2009, 2009 International Conference on Computational Aspects of Social Networks.

[41]  Francesco Ricci,et al.  Optimal radio channel recommendations with explicit and implicit feedback , 2012, RecSys.

[42]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[43]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[44]  Osmar R. Zaïane,et al.  Top Leaders Community Detection Approach in Information Networks , 2010 .

[45]  L. Freeman,et al.  On human social intelligence , 1988 .

[46]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[47]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[48]  P. D. Laat The collegial phenomenon. The social mechanisms of cooperation among peers in a corporate law partnership , 2003 .

[49]  Jie Tang,et al.  Learning to Infer Social Ties in Large Networks , 2011, ECML/PKDD.

[50]  Tanya Y. Berger-Wolf,et al.  Inferring the Maximum Likelihood Hierarchy in Social Networks , 2009, 2009 International Conference on Computational Science and Engineering.

[51]  Ben Taskar,et al.  Inductive Logic Programming in a Nutshell , 2007 .

[52]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[53]  Neil J. Hurley,et al.  Robust Collaborative Recommendation , 2011, Recommender Systems Handbook.

[54]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[55]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[56]  R. Hanneman Introduction to Social Network Methods , 2001 .

[57]  Dan Cosley,et al.  Finding social roles in Wikipedia , 2011, iConference.

[58]  Dan Cosley,et al.  Inferring social ties from geographic coincidences , 2010, Proceedings of the National Academy of Sciences.

[59]  Katarzyna Musial,et al.  Social networks on the Internet , 2012, World Wide Web.

[60]  B. Biddle RECENT DEVELOPMENTS IN ROLE THEORY , 1986 .

[61]  Gediminas Adomavicius,et al.  Context-aware recommender systems , 2008, RecSys '08.

[62]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[63]  Connie M. Borror,et al.  A Second Course in Statistics: Regression Analysis, 6th Ed. , 2003 .

[64]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[65]  Terri L. Moore,et al.  Regression Analysis by Example , 2001, Technometrics.

[66]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[67]  Kristian Kersting,et al.  Multi-Relational Learning with Gaussian Processes , 2009, IJCAI.

[68]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[69]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[70]  Hocine Cherifi,et al.  Qualitative Comparison of Community Detection Algorithms , 2011, DICTAP.

[71]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[72]  Andreas Hotho,et al.  Social Tagging Recommender Systems , 2011, Recommender Systems Handbook.

[73]  Achim Rettinger,et al.  A statistical relational model for trust learning , 2008, AAMAS.

[74]  Yifei Yuan,et al.  Scalable Influence Maximization in Social Networks under the Linear Threshold Model , 2010, 2010 IEEE International Conference on Data Mining.

[75]  Feng Luo,et al.  Exploring Local Community Structures in Large Networks , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[76]  Lise Getoor,et al.  Relationship Identification for Social Network Discovery , 2007, AAAI.

[77]  Andreas Dengel,et al.  Core/periphery structure versus clustering in international weblogs , 2011, 2011 International Conference on Computational Aspects of Social Networks (CASoN).

[78]  Gediminas Adomavicius,et al.  Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques , 2012, IEEE Transactions on Knowledge and Data Engineering.

[79]  Alfred Kobsa,et al.  Privacy-enhanced personalization , 2006, FLAIRS.

[80]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[81]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[82]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[83]  Xiang Li,et al.  Learning Hierarchical Relationships among Partially Ordered Objects with Heterogeneous Attributes and Links , 2012, SDM.

[84]  Jie Tang,et al.  Detecting Community Kernels in Large Social Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[85]  David Heckerman,et al.  Probabilistic Entity-Relationship Models, PRMs, and Plate Models , 2004 .

[86]  J. Berry The Influentials: One American in Ten Tells the Other Nine How to Vote, Where to Eat, and What to Buy , 2003 .

[87]  Nick Koudas,et al.  Efficient identification of starters and followers in social media , 2009, EDBT '09.

[88]  Gediminas Adomavicius,et al.  Incorporating contextual information in recommender systems using a multidimensional approach , 2005, TOIS.

[89]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[90]  Ricardo J. G. B. Campello,et al.  Improving the Efficiency of a Clustering Genetic Algorithm , 2004, IBERAMIA.

[91]  John Riedl,et al.  Recommender systems: from algorithms to user experience , 2012, User Modeling and User-Adapted Interaction.

[92]  Kennon M. Sheldon,et al.  Social roles as mechanisms for psychological need satisfaction within social groups. , 2001, Journal of personality and social psychology.

[93]  Marcelo Maia,et al.  Identifying user behavior in online social networks , 2008, SocialNets '08.

[94]  Charles L. Forgy,et al.  Rete: a fast algorithm for the many pattern/many object pattern match problem , 1991 .

[95]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[96]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[97]  Bill Howard,et al.  Analyzing online social networks , 2008, Commun. ACM.

[98]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[99]  Katarzyna Musial,et al.  Recommendation Framework for Online Social Networks , 2006, Advances in Web Intelligence and Data Mining.

[100]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[101]  Ruoming Jin,et al.  Axiomatic ranking of network role similarity , 2011, KDD.

[102]  Stephan Baumann,et al.  A Journey to the Core of the Blogosphere , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[103]  George Karypis,et al.  A Comprehensive Survey of Neighborhood-based Recommendation Methods , 2011, Recommender Systems Handbook.

[104]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[105]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[106]  Bernd Ludwig,et al.  Context relevance assessment and exploitation in mobile recommender systems , 2012, Personal and Ubiquitous Computing.

[107]  J. Golbeck,et al.  FilmTrust: movie recommendations using trust in web-based social networks , 2006, CCNC 2006. 2006 3rd IEEE Consumer Communications and Networking Conference, 2006..

[108]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[109]  S. Robertson The probability ranking principle in IR , 1997 .

[110]  Mika Gustafsson,et al.  Comparison and validation of community structures in complex networks , 2006 .

[111]  Ben Shneiderman,et al.  Visualizing Threaded Conversation Networks: Mining Message Boards and Email Lists for Actionable Insights , 2010, AMT.

[112]  Barry Smyth,et al.  Similarity vs. Diversity , 2001, ICCBR.

[113]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[114]  John Riedl,et al.  Learning preferences of new users in recommender systems: an information theoretic approach , 2008, SKDD.

[115]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[116]  Michael J. Muller,et al.  Motivations for social networking at work , 2008, CSCW.

[117]  Marc A. Smith,et al.  A Conceptual and Operational Definition of 'Social Role' in Online Community , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[118]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[119]  Ben Taskar,et al.  Markov Logic: A Unifying Framework for Statistical Relational Learning , 2007 .

[120]  R. H. Moore,et al.  Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.

[121]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[122]  Danyel Fisher,et al.  Visualizing the Signatures of Social Roles in Online Discussion Groups , 2007, J. Soc. Struct..

[123]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[124]  Volker Tresp,et al.  Nonlinear Markov Networks for Continuous Variables , 1997, NIPS.

[125]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[126]  Francesco Ricci,et al.  Mobile Recommender Systems , 2010, J. Inf. Technol. Tour..

[127]  M. Pagel,et al.  Social networks: we get by with (and in spite of) a little help from our friends. , 1987, Journal of personality and social psychology.

[128]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[129]  Tsvi Kuflik,et al.  Mediation of user models for enhanced personalization in recommender systems , 2007, User Modeling and User-Adapted Interaction.

[130]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[131]  Rasoul Karimi,et al.  Active Learning for Recommender Systems , 2015, KI - Künstliche Intelligenz.

[132]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[133]  T. Sparks,et al.  Understanding Regression Analysis: An Introductory Guide Quantitative Applications in the Social Sciences No. 57 , 1987 .

[134]  Shlomo Hershkop,et al.  Automated social hierarchy detection through email network analysis , 2007, WebKDD/SNA-KDD '07.

[135]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[136]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[137]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[138]  Xin Jin,et al.  Topic initiator detection on the world wide web , 2010, WWW '10.

[139]  J. Stanton Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors , 2001 .

[140]  Martin G. Everett,et al.  Role similarity and complexity in social networks , 1985 .

[141]  Ricardo J. G. B. Campello,et al.  Relative clustering validity criteria: A comparative overview , 2010, Stat. Anal. Data Min..

[142]  Danyel Fisher,et al.  You Are Who You Talk To: Detecting Roles in Usenet Newsgroups , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[143]  Scott A. Golder,et al.  SOCIAL ROLES IN ELECTRONIC COMMUNITIES , 2004 .

[144]  Enric Plaza,et al.  Case-Based Sequential Ordering of Songs for Playlist Recommendation , 2006, ECCBR.

[145]  V. Cattell Poor people, poor places, and poor health: the mediating role of social networks and social capital. , 2001, Social science & medicine.

[146]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[147]  Jennifer Neville,et al.  Dependency networks for relational data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[148]  Hans-Peter Kriegel,et al.  Factorizing YAGO: scalable machine learning for linked data , 2012, WWW.

[149]  E. C. Dalrymple-Alford Measurement of clustering in free recall. , 1970 .

[150]  Vincent T. Y. Ng,et al.  Identifying influential users by their postings in social networks , 2012, MSM '12.

[151]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[152]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[153]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[154]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[155]  Jiawei Han,et al.  Mining advisor-advisee relationships from research publication networks , 2010, KDD.

[156]  Reza Zafarani,et al.  Analyzing Behavior of the Influentials Across Social Media , 2012 .

[157]  Malay K. Pakhira,et al.  Computing approximate value of the pbm index for counting number of clusters using genetic algorithm , 2011, 2011 International Conference on Recent Trends in Information Systems.

[158]  Jennifer Golbeck,et al.  Generating Predictive Movie Recommendations from Trust in Social Networks , 2006, iTrust.

[159]  Randy Goebel,et al.  Detecting Communities in Social Networks Using Max-Min Modularity , 2009, SDM.

[160]  J. Montgomery Social Networks and Labor-Market Outcomes: Toward an Economic Analysis , 1991 .

[161]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[162]  James A. Davis Clustering and Structural Balance in Graphs , 1977 .

[163]  Young-Rae Cho,et al.  Entropy-Based Graph Clustering: Application to Biological and Social Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[164]  Lorraine McGinty,et al.  On the Evolution of Critiquing Recommenders , 2011, Recommender Systems Handbook.

[165]  Francesco Ricci,et al.  Rating Elicitation Strategies for Collaborative Filtering , 2011, EC-Web.

[166]  Judith Masthoff,et al.  Group Recommender Systems: Combining Individual Models , 2011, Recommender Systems Handbook.

[167]  Lisa Kaati,et al.  Social positions and simulation relations , 2011, Social Network Analysis and Mining.

[168]  H. White,et al.  “Structural Equivalence of Individuals in Social Networks” , 2022, The SAGE Encyclopedia of Research Design.

[169]  Nava Tintarev,et al.  Evaluating the effectiveness of explanations for recommender systems , 2012, User Modeling and User-Adapted Interaction.

[170]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[171]  Ricardo J. G. B. Campello,et al.  Evolving clusters in gene-expression data , 2006, Inf. Sci..

[172]  Francesco Ricci,et al.  Improving Recommendation Effectiveness: Adapting a Dialogue Strategy in Online Travel Planning , 2009, J. Inf. Technol. Tour..

[173]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[174]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[175]  Robin D. Burke,et al.  Hybrid Web Recommender Systems , 2007, The Adaptive Web.

[176]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[177]  David Dominguez-Sal,et al.  Building a role search engine for social media , 2012, WWW.

[178]  L. Hubert,et al.  A general statistical framework for assessing categorical clustering in free recall. , 1976 .

[179]  Alfred Kobsa,et al.  Generic User Modeling Systems , 2001, User Modeling and User-Adapted Interaction.

[180]  Wan-Shiou Yang,et al.  Mining Social Networks for Targeted Advertising , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[181]  Michael Kifer,et al.  RIF Framework for Logic Dialects , 2009 .

[182]  P. Allison Multiple Regression: A Primer , 1994 .

[183]  Lina Zhou,et al.  Social computing and weighting to identify member roles in online communities , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[184]  Ram Dantu,et al.  Identification of leaders, lurkers, associates and spammers in a social network: context-dependent and context-independent approaches , 2011, Social Network Analysis and Mining.

[185]  Garry Robins,et al.  Small Worlds Among Interlocking Directors: Network Structure and Distance in Bipartite Graphs , 2004, Comput. Math. Organ. Theory.