the analysis of large-scale social networks, a central prob- lem is how to discover how members of the network to be analyzed are related. Instant messaging (IM) is a popu- lar and relatively new form of social interaction. In this paper we study IM communities as social networks. An ob- vious barrier to such a study is that there is no de facto measure for how closely any pair of members of such a com- munity are associated to describe the link information. We introduce several such measures in this paper. These pro- posed measures are obtained solely from the status logs of IM users. The status log of an IM user is a list of pairs of the form (time, state), where state is an element of a small set, such as {online, of f line, busy, away}, and time is the time at which the member switched into that state. Resig et al. show (12) that, in spite of their simplicity, status logs contain a great deal of structure. Since any pair of IM users can instant message each other only if they are both online at the same time, it seems reasonable to guess that any two IM users that are frequently online at the same time may in fact be frequently instant messaging each other. This hy- pothesis forms the basis of each of our association measures. For a chosen population of IM users, we compare the social networks obtained using our relationship measures to the so- cial network formed in LiveJournal (www.livejournal.com) by the same population. LiveJournal is a blogging commu- nity that allows users to explicitly name other LiveJournal users as associates. The network obtained by these associ- ation lists thus acts as a control of sorts for validating our IM-based association measure.
[1]
Heikki Mannila,et al.
Similarity of Attributes by External Probes
,
1998,
KDD.
[2]
Ankur Teredesai,et al.
A Framework for Mining Instant Messaging Services
,
2004
.
[3]
J. Hartigan.
Direct Clustering of a Data Matrix
,
1972
.
[4]
Anil K. Jain,et al.
Algorithms for Clustering Data
,
1988
.
[5]
Zhang Yi,et al.
Clustering Categorical Data
,
2000,
Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[6]
Jon M. Kleinberg,et al.
Clustering categorical data: an approach based on dynamical systems
,
2000,
The VLDB Journal.
[7]
Ted E. Senator,et al.
Countering terrorism through information technology
,
2004,
CACM.
[8]
Inderjit S. Dhillon,et al.
Information-theoretic co-clustering
,
2003,
KDD '03.
[9]
Johannes Gehrke,et al.
CACTUS—clustering categorical data using summaries
,
1999,
KDD '99.
[10]
Alberto Escudero-Pascual,et al.
Questioning lawful access to traffic data
,
2004,
CACM.
[11]
J. A. Hartigan,et al.
A k-means clustering algorithm
,
1979
.