An e-mail analysis method based on text mining techniques

This paper proposes a method employing text mining techniques to analyze e-mails collected at a customer center. The method uses two kinds of domain-dependent knowledge. One is a key concept dictionary manually provided by human experts. The other is a concept relation dictionary automatically acquired by a fuzzy inductive learning algorithm. The method inputs the subject and the body of an e-mail and decides a text class for the e-mail. Also, the method extracts key concepts from e-mails and presents their statistical information. This paper applies the method to three kinds of analysis tasks: a product analysis task, a contents analysis task, and an address analysis task. The results of numerical experiments indicate that acquired concept relation dictionaries correspond to the intuition of operators in the customer center and give highly precise ratios in the classification.

[1]  Ryohei Orihara,et al.  Acquisition of a knowledge dictionary for a text mining system using an inductive learning method , 2001 .

[2]  Adam Kowalczyk,et al.  Combining clustering and co-training to enhance text classification using unlabelled data , 2002, KDD.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Jason D. M. Rennie ifile: An Application of Machine Learning to E-Mail Filtering , 2000 .

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Ido Dagan,et al.  Mining Text Using Keyword Distributions , 1998, Journal of Intelligent Information Systems.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Ryohei Orihara,et al.  Acquisition of a Knowledge Dictionary from Training Examples Including Multiple Values , 2002, ISMIS.

[10]  Olivier de Vel,et al.  Mining E-mail Authorship , 2000 .

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Robert P. Goldman,et al.  Textual data mining of service center call records , 2000, KDD '00.