Mining hidden community in heterogeneous social networks

Social network analysis has attracted much attention in recent years. Community mining is one of the major directions in social network analysis. Most of the existing methods on community mining assume that there is only one kind of relation in the network, and moreover, the mining results are independent of the users' needs or preferences. However, in reality, there exist multiple, heterogeneous social networks, each representing a particular kind of relationship, and each kind of relationship may play a distinct role in a particular task. Thus mining networks by assuming only one kind of relation may miss a lot of valuable hidden community information and may not be adaptable to the diverse information needs from different users.In this paper, we systematically analyze the problem of mining hidden communities on heterogeneous social networks. Based on the observation that different relations have different importance with respect to a certain query, we propose a new method for learning an optimal linear combination of these relations which can best meet the user's expectation. With the obtained relation, better performance can be achieved for community mining. Our approach to social network analysis and community mining represents a major shift in methodology from the traditional one, a shift from single-network, user-independent analysis to multi-network, user-dependant, and query-based analysis. Experimental results on Iris data set and DBLP data set demonstrate the effectiveness of our method.

[1]  Soumen Chakrabarti,et al.  Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction , 2001, WWW '01.

[2]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[3]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[4]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[5]  Hanif D. Sherali,et al.  Linear Programming and Network Flows , 1977 .

[6]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[7]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[8]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[9]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[10]  Pedro M. Domingos Multi-Relational Record Linkage , 2003 .

[11]  Gary William Flake,et al.  Self-organization of the web and identification of communities , 2002 .

[12]  C. Lee Giles,et al.  Clustering and identifying temporal trends in document databases , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[13]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[18]  Masaru Kitsuregawa,et al.  Observing Evolution of Web Communities , 2002 .

[19]  Wei-Ying Ma,et al.  A Concentric-Circle Model for Community Mining , 2002 .

[20]  Les Carr,et al.  Trailblazing the literature of hypertext: author co-citation analysis (1989–1998) , 1999, HYPERTEXT '99.

[21]  Ramakrishnan Srikant,et al.  Mining newsgroups using networks arising from social behavior , 2003, WWW '03.

[22]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[23]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[24]  Jon M Kleinberg,et al.  Hubs, authorities, and communities , 1999, CSUR.

[25]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[26]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[27]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[28]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[29]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[30]  Bart Selman,et al.  Agent Amplified Communication , 1996, AAAI/IAAI, Vol. 1.

[31]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.

[32]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[33]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[34]  David G. Stork,et al.  Pattern Classification , 1973 .

[35]  Michael F. Schwartz,et al.  Discovering shared interests using graph analysis , 1993, CACM.

[36]  Wei-Ying Ma,et al.  Block-level link analysis , 2004, SIGIR '04.

[37]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[38]  Åke Björck,et al.  Numerical Methods , 2021, Markov Renewal and Piecewise Deterministic Processes.