Weighted Voting of Different Term Weighting Methods for Natural Language Call Routing

The text classification problem for natural language call routing was considered in the paper. Seven different term weighting methods were applied. As dimensionality reduction methods, the combination of stop-word filtering and stemming and the feature transformation based on term belonging to classes were considered. kNN and SVM-FML were used as classification algorithms. In the paper the idea of voting with different term weighting methods was proposed. The majority vote of seven considered term weighting methods provides significant improvement of classification effectiveness. After that the weighted voting based on optimization with self-adjusting genetic algorithm was investigated. The numerical results showed that weighted voting provides additional improvement of classification effectiveness. Especially significant improvement of the classification effectiveness is observed with the feature transformation based on term belonging to classes that reduces the dimensionality radically; the dimensionality equals number of classes. Therefore, it can be useful for real-time systems as natural language call routing.

[1]  Youngjoong Ko,et al.  A study of term weighting schemes using class information for text classification , 2012, SIGIR '12.

[2]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[3]  Christopher J. Fox,et al.  A stop list for general text , 1989, SIGF.

[4]  T. Breuel,et al.  Pattern Recognition Engineering , 2010 .

[5]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[6]  Wolfgang Minker,et al.  A Comparative Study of Text Preprocessing Approaches for Topic Detection of User Utterances , 2016, LREC.

[7]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[8]  Jian Su,et al.  Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Eugene Semenkin,et al.  Competitive cooperation for strategy adaptation in coevolutionary genetic algorithm for constrained optimization , 2010, IEEE Congress on Evolutionary Computation.

[10]  Eugene Semenkin,et al.  Self-configuring genetic programming algorithm with modified uniform crossover , 2012, 2012 IEEE Congress on Evolutionary Computation.

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[13]  Guy W. Mineau,et al.  Beyond TFIDF Weighting for Text Categorization in the Vector Space Model , 2005, IJCAI.

[14]  Fabrizio Sebastiani,et al.  Supervised term weighting for automated text categorization , 2003, SAC '03.

[15]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16]  Wolfgang Minker,et al.  Text categorization methods application for natural language call routing , 2014, 2014 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO).

[17]  Gary Geunbae Lee,et al.  Example-based dialog modeling for practical multi-domain dialog system , 2009, Speech Commun..

[18]  Jong-Hyeok Lee,et al.  Text categorization based on k-nearest neighbor approach for Web site classification , 2003, Inf. Process. Manag..

[19]  Chunping Li,et al.  A Novel Term Weighting Scheme for Automated Text Categorization , 2007, Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007).

[20]  Eugene Semenkin,et al.  Automatically generated classifiers for opinion mining with different term weighting schemes , 2014, 2014 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO).

[21]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[22]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[23]  Wolfgang Minker,et al.  Opinion Mining and Topic Categorization with Novel Term Weighting , 2014, WASSA@ACL.

[24]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[25]  Barbara Freeman,et al.  A comparative study of speech in the call center: natural language call routing vs. touch-tone menus , 2002, CHI.

[26]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[27]  Volker Tresp,et al.  Meta-Classification using SVM Classifiers for Text Documents , 2008 .

[28]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.