Structural analysis of chat messages for topic detection

Purpose – The purpose of this research is to study the characteristics of chat messages from analysing a collection of 33,121 sample messages gathered from 1,700 sessions of conversations of 72 pairs of MSN Messenger users over a four month duration from June to September of 2005. The primary objective of chat message characterization is to understand the properties of chat messages for effective message analysis, such as message topic detection.Design/methodology/approach – From the study on chat message characteristics, an indicative term‐based categorization approach for chat topic detection is proposed. In the proposed approach, different techniques such as sessionalisation of chat messages and extraction of features from icon texts and URLs are incorporated for message pre‐processing. Naive Bayes, Associative Classification, and Support Vector Machine are employed as classifiers for categorizing topics from chat sessions.Findings – Indicative term‐based approach is superior to the traditional documen...

[1]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[2]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[3]  Lars Kai Hansen,et al.  Signal Detection Using Ica: Application to Chat Room Topic Spotting , 2002 .

[4]  NgHwee Tou,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997 .

[5]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[7]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[10]  Eiman Elnahrawy,et al.  Log-Based Chat Room Monitoring Using Text Categorization: A Comparative Study , 2002 .

[11]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[12]  Malik Magdon-Ismail,et al.  Detecting conversing groups of chatters: a model, algorithms, and tests , 2005, IADIS AC.

[13]  Even Flood,et al.  ODP, Open Directory Project , 2005 .

[14]  Young-Woo Seo,et al.  Text clustering for topic detection , 2004 .

[15]  Susan Gauch,et al.  ChatTrack: Chat Room Topic Detection Using Classification , 2004, ISI.

[16]  Ankur Teredesai,et al.  Extracting Social Networks from Instant Messaging Populations , 2004 .

[17]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[18]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[19]  Ata Kabán,et al.  Topic Identification in Dynamical Text by Complexity Pursuit , 2003, Neural Processing Letters.

[20]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[21]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[22]  Kostas Tzeras,et al.  Automatic indexing based on Bayesian inference networks , 1993, SIGIR.