A Framework for Mining Instant Messaging Services

Developing a framework for analysis of large scale masscommunication media such as instant messaging (popularly known as IM) has gone largely unexplored up until this point. This paper explores various data mining issues and how they relate to Instant Messaging and current CounterTerrorism efforts. Specific topics include user pattern analysis, anomaly detection, limited message size based textual topic detection, and largely generic social network analysis in this context. Several interesting questions are posed and the current framework being developed explores some of the

[1]  L. Graves Finding clusters in network link strength dataTodd , 1998 .

[2]  Eric Horvitz,et al.  Coordinates: Probabilistic Forecasting of Presence and Availability , 2002, UAI.

[3]  Ata Kabán,et al.  Topic Identification in Dynamical Text by Complexity Pursuit , 2003, Neural Processing Letters.

[4]  Sandeep Kumar,et al.  Classification and detection of computer intrusions , 1996 .

[5]  Faisal M. Khan,et al.  Mining Chat-room Conversations for Social and Semantic Interactions , 2002 .

[6]  Lars Kai Hansen,et al.  Signal Detection Using Ica: Application to Chat Room Topic Spotting , 2002 .

[7]  John C. Tang,et al.  Work rhythms: analyzing visualizations of awareness histories of distributed groups , 2002, CSCW '02.

[8]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[9]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  John C. Tang,et al.  ConNexus to awarenex: extending awareness to mobile users , 2001, CHI.

[11]  Peter G. Neumann,et al.  EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances , 1997, CCS 2002.

[12]  Stottler Henke,et al.  A CBR Approach to Asymmetric Plan Detection , 2003 .

[13]  John C. Tang,et al.  When Can I Expect an Email Response? A Study of Rhythms in Email Usage , 2003, ECSCW.

[14]  James Begole,et al.  Activity rhythm detection and modeling , 2003, CHI Extended Abstracts.

[15]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[16]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[17]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[18]  Valdis E. Krebs,et al.  Mapping Networks of Terrorist Cells , 2001 .

[19]  Christopher G. Atkeson,et al.  Predicting human interruptibility with sensors: a Wizard of Oz feasibility study , 2003, CHI '03.

[20]  John C. Tang,et al.  Beyond Instant Messaging , 2003, ACM Queue.

[21]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[22]  Todd L. Graves Finding Clusters in Network Link Strength Data , 1999 .

[23]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[24]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[25]  E. Zerubavel Hidden Rhythms: Schedules and Calendars in Social Life , 1981 .

[26]  Alfonso Valdes,et al.  Next Generation Intrusion Detection Expert System (NIDES), Software Users Manual , 1994 .