An Assessment of Case Base Reasoning for Short Text Message Classification

Message classification is a text classification task that has provoked much interest in machine learning. One aspect of message classification that presents a particular challenge is the classification of short text messages. This paper presents an assessment of applying a casebased reasoning approach that was developed for long text messages (specifically spam filtering) to short text messages. The evaluation involves determining the most appropriate feature types and feature representation for short text messages and then comparing the performance of the case-based classifier with both a Naive Bayes classifier and a Support Vector Machine. Our evaluation shows that short text messages require different features and even different classifiers than long text messages. A machine learner which is to classify text messages will require some level of configuration in these aspects.

[1]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[2]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[7]  Günter Neumann,et al.  Combining Shallow Text Processing and Machine Learning in Real World Applications , 1999 .

[8]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[9]  Sven Schmeier,et al.  Message Classification in the Call Center , 2000, ANLP.

[10]  Padraig Cunningham,et al.  An Assessment of Case-Based Reasoning for Spam Filtering , 2005, Artificial Intelligence Review.

[11]  Padraig Cunningham,et al.  An Analysis of Case-Base Editing in a Spam Filtering System , 2004, ECCBR.

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Haym Hirsh,et al.  Using LSI for text classification in the presence of background text , 2001, CIKM '01.

[14]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[15]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[16]  Sarah Zelikovitz Transductive LSI for Short Text Classification Problems , 2004, FLAIRS Conference.

[17]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[18]  Haym Hirsh,et al.  Improving Short Text Classification Using Unlabeled Background Knowledge , 2000, ICML 2000.