Modeling and Mining Ubiquitous Social Media

Community detection is an unsupervised learning task that discovers groups such that group members share more similarities or interact more frequently among themselves than with people outside groups. In social media, link information can reveal heterogeneous relationships of various strengths, but often can be noisy. Since different sources of data in social media can provide complementary information, e.g., bookmarking and tagging data indicates user interests, frequency of commenting suggests the strength of ties, etc., we propose to integrate social media data of multiple types for improving the performance of community detection. We present a joint optimization framework to integrate multiple data sources for community detection. Empirical evaluation on both synthetic data and real-world social media data shows significant performance improvement of the proposed approach. This work elaborates the need for and challenges of multi-source integration of heterogeneous data types, and provides a principled way of multi-source community detection.

[1]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[2]  Shashi Shekhar,et al.  Spatial Databases: A Tour , 2003 .

[3]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[4]  Carlo Morselli,et al.  The Efficiency/Security Trade-Off in Criminal Networks , 2007, Soc. Networks.

[5]  Corinna Cortes,et al.  Communities of interest , 2001, Intell. Data Anal..

[6]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[7]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[8]  Franco Turini,et al.  Knowledge Discovery from Geographical Data , 2008, Mobility, Data Mining and Privacy.

[9]  Stephen B. Seidman,et al.  A graph‐theoretic generalization of the clique concept* , 1978 .

[10]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Jari Saramäki,et al.  Temporal Networks , 2011, Encyclopedia of Social Network Analysis and Mining.

[12]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[13]  Anita Talib,et al.  Modelling Urban Land Use Change Using Geographically Weighted Regression and the Implications for Sustainable Environmental Planning , 2010 .

[14]  Kris Luyten,et al.  Distributed user interface elements to support smart interaction spaces , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[15]  San-Yih Hwang,et al.  A process-mining framework for the detection of healthcare fraud and abuse , 2006, Expert Syst. Appl..

[16]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[17]  Ling Liu,et al.  A reputation-based trust model for peer-to-peer ecommerce communities , 2003, EC.

[18]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[19]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[20]  Marko Bajec,et al.  An expert system for detecting automobile insurance fraud using social network analysis , 2011, Expert Syst. Appl..

[21]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[22]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[23]  Jan Van den Bergh,et al.  Designing distributed user interfaces for ambient intelligent environments using models and simulations , 2006, Comput. Graph..

[24]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.