Data Mining on Social Interaction Networks

Social media and social networks have already woven themselves into the very fabric of everyday life. This results in a dramatic increase of social data capturing various relations between the users and their associated artifacts, both in online networks and the real world using ubiquitous devices. In this work, we consider social interaction networks from a data mining perspective - also with a special focus on real-world face-to-face contact networks: We combine data mining and social network analysis techniques for examining the networks in order to improve our understanding of the data, the modeled behavior, and its underlying emergent processes. Furthermore, we adapt, extend and apply known predictive data mining algorithms on social interaction networks. Additionally, we present novel methods for descriptive data mining for uncovering and extracting relations and patterns for hypothesis generation and exploration, in order to provide characteristic information about the data and networks. The presented approaches and methods aim at extracting valuable knowledge for enhancing the understanding of the respective data, and for supporting the users of the respective systems. We consider data from several social systems, like the social bookmarking system BibSonomy, the social resource sharing system flickr, and ubiquitous social systems: Specifically, we focus on data from the social conference guidance system Conferator and the social group interaction system MyGroup. This work first gives a short introduction into social interaction networks, before we describe several analysis results in the context of online social networks and real-world face-to-face contact networks. Next, we present predictive data mining methods, i.e., for localization, recommendation and link prediction. After that, we present novel descriptive data mining methods for mining communities and patterns.

[1]  Matthijs van Leeuwen,et al.  Maximal exceptions with minimal descriptions , 2010, Data Mining and Knowledge Discovery.

[2]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[3]  Gerd Stumme,et al.  Anatomy of a conference , 2012, HT '12.

[4]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[5]  Huan Liu,et al.  Community Detection and Mining in Social Media , 2010, Community Detection and Mining in Social Media.

[6]  Lakhmi C. Jain,et al.  Data Mining: Foundations and Intelligent Paradigms , 2012 .

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[9]  Gerd Stumme,et al.  A Personality Based Design Approach Using Subgroup Discovery , 2012, HCSE.

[10]  Mark Weiser,et al.  Some computer science issues in ubiquitous computing , 1993, CACM.

[11]  L. Freeman Segregation in Social Networks , 1978 .

[12]  Martin Atzmüller,et al.  Efficient Descriptive Community Mining , 2011, FLAIRS.

[13]  Ciro Cattuto,et al.  High-Resolution Measurements of Face-to-Face Contact Patterns in a Primary School , 2011, PloS one.

[14]  Ciro Cattuto,et al.  Semantics, Sensors, and the Social Web: The Live Social Semantics Experiments , 2010, ESWC.

[15]  Cecilia Mascolo,et al.  Socio-Spatial Properties of Online Location-Based Social Networks , 2011, ICWSM.

[16]  Dominik Benz,et al.  User-Relatedness and Community Structure in Social Interaction Networks , 2013, ArXiv.

[17]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[18]  Dominik Benz,et al.  Community Assessment Using Evidence Networks , 2010, MSM/MUSE.

[19]  Ciro Cattuto,et al.  What's in a crowd? Analysis of face-to-face behavioral networks , 2010, Journal of theoretical biology.

[20]  Epaminondas Kapetanios,et al.  Quo Vadis computer science: From Turing to personal computer, personal content and collective intelligence , 2008, Data Knowl. Eng..

[21]  Ciro Cattuto,et al.  Semantic Grounding of Tag Relatedness in Social Bookmarking Systems , 2008, SEMWEB.

[22]  Heikki Mannila,et al.  Theoretical frameworks for data mining , 2000, SKDD.

[23]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[24]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[25]  Ke Wang,et al.  Mining Actionable Patterns by Role Models , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[26]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[27]  Martin Atzmüller,et al.  The Mining and Analysis Continuum of Explaining Uncovered , 2010, SGAI Conf..

[28]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[29]  Joao Antonio Pereira,et al.  Linked: The new science of networks , 2002 .

[30]  Stefan Wrobel,et al.  Tight Optimistic Estimates for Fast Subgroup Discovery , 2008, ECML/PKDD.

[31]  A. Pentland,et al.  Collective intelligence , 2006, IEEE Comput. Intell. Mag..

[32]  Gerd Stumme,et al.  On the Predictability of Human Contacts: Influence Factors and the Strength of Stronger Ties , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[33]  Dino Pedreschi,et al.  Human mobility, social ties, and link prediction , 2011, KDD.

[34]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Arno J. Knobbe,et al.  Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[36]  Ciro Cattuto,et al.  Dynamics of Person-to-Person Interactions from Distributed RFID Sensor Networks , 2010, PloS one.

[37]  Chrysanthos Dellarocas,et al.  Harnessing Crowds: Mapping the Genome of Collective Intelligence , 2009 .

[38]  Daniel M. Germán,et al.  What do large commits tell us?: a taxonomical study of large commits , 2008, MSR '08.

[39]  Lukasz A. Kurgan,et al.  A survey of Knowledge Discovery and Data Mining process models , 2006, The Knowledge Engineering Review.

[40]  Dominik Benz,et al.  Visit me, click me, be my friend: an analysis of evidence networks of user relationships in BibSonomy , 2010, HT '10.

[41]  Jan Marco Leimeister,et al.  Collective Intelligence , 2010, Bus. Inf. Syst. Eng..

[42]  Andreas Hotho,et al.  Resource-Aware On-line RFID Localization Using Proximity Data , 2011, ECML/PKDD.

[43]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[44]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[45]  S. Strogatz Exploring complex networks , 2001, Nature.

[46]  Gaetano Borriello,et al.  Design and Calibration of the SpotON Ad-Hoc Location Sensing System , 2001 .

[47]  Caroline O. Buckee,et al.  Digital Epidemiology , 2012, PLoS Comput. Biol..

[48]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[49]  Gerd Stumme,et al.  Profile Mining in CVS-Logs and Face-to-Face Contacts for Recommending Software Developers , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[50]  Andreas Hotho,et al.  On the Semantics of User Interaction in Social Media , 2013, LWA.

[51]  Andreas Hotho,et al.  Face-to-Face Contacts at a Conference: Dynamics of Communities and Roles , 2011, MSM/MUSE.

[52]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[53]  Ben Shneiderman,et al.  Analyzing Social Media Networks with NodeXL: Insights from a Connected World , 2010 .

[54]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[55]  Alexis Papadimitriou,et al.  Friendlink: Link prediction in social networks via bounded local path traversal , 2011, 2011 International Conference on Computational Aspects of Social Networks (CASoN).

[56]  Tom M Mitchell,et al.  Mining Our Reality , 2009, Science.

[57]  Martin Atzmüller,et al.  Mining social media: key players, sentiments, and communities , 2012, WIREs Data Mining Knowl. Discov..

[58]  Claudio Castellano,et al.  Community Structure in Graphs , 2007, Encyclopedia of Complexity and Systems Science.

[59]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[60]  James Caverlee,et al.  A geographic study of tie strength in social media , 2011, CIKM '11.

[61]  Rüdiger Wirth,et al.  CRISP-DM: Towards a Standard Process Model for Data Mining , 2000 .

[62]  Ciro Cattuto,et al.  Live Social Semantics , 2009, SEMWEB.

[63]  A. Barrat,et al.  Simulation of an SEIR infectious disease model on the dynamic contact network of conference attendees , 2011, BMC medicine.

[64]  Florian Lemmerich,et al.  Generic Pattern Trees for Exhaustive Exceptional Model Mining , 2012, ECML/PKDD.

[65]  Jan Marco Leimeister,et al.  How to use behavioral research insights on trust for HCI system design , 2012, CHI Extended Abstracts.

[66]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[67]  Chirayu Wongchokprasitti,et al.  Conference Navigator 2.0: Community-Based Recommendation for Academic Conferences , 2010 .

[68]  Karin D. Knorr-Cetina Sociality with Objects : Social Relations in Postsocial Knowledge Societies , 1997 .

[69]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[70]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[71]  Andreas Hotho,et al.  Tag recommendations in social bookmarking systems , 2008, AI Commun..

[72]  Dominik Benz,et al.  Enhancing Social Interactions at Conferences , 2011, it Inf. Technol..

[73]  John Scott,et al.  Social network analysis: developments, advances, and prospects , 2010, Social Network Analysis and Mining.

[74]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[75]  A. J. Feelders,et al.  Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach , 2010, 2010 IEEE International Conference on Data Mining.

[76]  Florian Lemmerich,et al.  Fast Subgroup Discovery for Continuous Target Concepts , 2009, ISMIS.

[77]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[78]  Florian Lemmerich,et al.  VIKAMINE - Open-Source Subgroup Discovery, Pattern Mining, and Analytics , 2012, ECML/PKDD.

[79]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[80]  Andreas Hotho,et al.  Recommendation in the Social Web , 2011, AI Mag..

[81]  Amit P. Sheth,et al.  Computing for human experience: Semantics-empowered sensors, services, and social computing on the ubiquitous Web , 2010, IEEE Internet Computing.

[82]  Bart Goethals,et al.  Survey on Frequent Pattern Mining , 2003 .

[83]  Einoshin Suzuki,et al.  Discovering Community-Oriented Roles of Nodes in a Social Network , 2010, DaWak.

[84]  Andreas Hotho,et al.  Towards the ubiquitous Web , 2010, Semantic Web.

[85]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[86]  Andreas Hotho,et al.  Recommender Systems for Social Tagging Systems , 2012, SpringerBriefs in Electrical and Computer Engineering.

[87]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[88]  Eun Mee Lim Patterns of kindergarten children’s social interaction with peers in the computer area , 2012, Int. J. Comput. Support. Collab. Learn..

[89]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[90]  Santo Fortunato,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[91]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[92]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[93]  Peter Brusilovsky,et al.  Community-based Conference Navigator , 2007 .

[94]  Tanya Y. Berger-Wolf,et al.  Constant-factor approximation algorithms for identifying dynamic communities , 2009, KDD.

[95]  Oren Etzioni,et al.  Face-to-Face and Computer-Mediated Communities, A Comparative Analysis , 1999, Inf. Soc..

[96]  Arno Knobbe,et al.  Exceptional Model Mining , 2008, ECML/PKDD.

[97]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[98]  Ciro Cattuto,et al.  Social Dynamics in Conferences: Analyses of Data from the Live Social Semantics Application , 2010, SEMWEB.

[99]  Yunhao Liu,et al.  LANDMARC: Indoor Location Sensing Using Active RFID , 2004, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, 2003. (PerCom 2003)..

[100]  Hsinchun Chen,et al.  Recommendation as link prediction: a graph kernel-based machine learning approach , 2009, JCDL '09.

[101]  Florian Lemmerich,et al.  Exploratory pattern mining on social media using geo-references and social tagging information , 2013, Int. J. Web Sci..

[102]  James H. Aylor,et al.  Computer for the 21st Century , 1999, Computer.

[103]  Daniel Gatica-Perez,et al.  Automatic nonverbal analysis of social interaction in small groups: A review , 2009, Image Vis. Comput..

[104]  Ulrik Brandes,et al.  Network Analysis: Methodological Foundations [outcome of a Dagstuhl seminar, 13-16 April 2004] , 2005, Network Analysis.

[105]  Xia Wang,et al.  Inferring Geographic Coincidence in Ephemeral Social Networks , 2012, ECML/PKDD.

[106]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[107]  Tao Zhou,et al.  Link prediction in weighted networks: The role of weak ties , 2010 .

[108]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[109]  Dominik Benz,et al.  Towards Mining Semantic Maturity in Social Bookmarking Systems , 2011, SDoW@ISWC.

[110]  Dominik Benz,et al.  The social bookmark and publication management system bibsonomy , 2010, The VLDB Journal.

[111]  Ke Zhang,et al.  Physical Proximity and Online User Behaviour in an Indoor Mobile Social Networking Application , 2011, 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing.

[112]  Stefan Siersdorfer,et al.  Social recommender systems for web 2.0 folksonomies , 2009, HT '09.

[113]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[114]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[115]  Krishna P. Gummadi,et al.  Growth of the flickr social network , 2008, WOSN '08.

[116]  P. Tan,et al.  Node roles and community structure in networks , 2007, WebKDD/SNA-KDD '07.

[117]  Jürgen Lerner,et al.  Role Assignments , 2004, Network Analysis.

[118]  Stéphane Ducasse,et al.  How developers drive software evolution , 2005, Eighth International Workshop on Principles of Software Evolution (IWPSE'05).

[119]  Ning Zhong,et al.  In Search of the Wisdom Web , 2002, Computer.

[120]  Pang-Ning Tan,et al.  Exploration of Link Structure and Community-Based Node Roles in Network Analysis , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[121]  Jan Marco Leimeister,et al.  Results from a Multidisciplinary Case Study , 2012 .

[122]  Pan Hui,et al.  Pocket switched networks and human mobility in conference environments , 2005, WDTN '05.

[123]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[124]  Cecilia Mascolo,et al.  Far from the eyes, close on the web: impact of geographic distance on online social interactions , 2012, WOSN '12.

[125]  David Weschsler,et al.  Concept of collective intelligence. , 1971 .

[126]  Michalis Faloutsos,et al.  Online social networks , 2010, IEEE Network.

[127]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.