Automating Complaints Processing in the Food and Economic Sector: A Classification Approach

Text categorization is a supervised learning task which aims to assign labels to documents based on the predicted outcome suggested by a classifier trained on a set of labelled documents. The association of text classification to facilitate labelling reports/complaints in the economic and health related fields can have a tremendous impact in the speed at which these are processed, and therefore, lowering the required time to act upon these complaints and reports. In this work, we aim to classify complaints into the main 4 economic activities given by the Portuguese Economic and Food Safety Authority. We evaluate the classification performance of 9 algorithms (Complement Naive Bayes, Bernoulli Naive Bayes, Multinomial Naive Bayes, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, AdaBoost and Logistic Regression) at different layers of text preprocessing. Results reveal high levels of accuracy, roughly around 85%. It was also observed that the linear classifiers (support vector machine and logistic regression) allowed us to obtain higher f1-measure values than the other classifiers in addition to the high accuracy values revealed. It was possible to conclude that the use of these algorithms is more adequate for the data selected, and that applying text classification methods can facilitate and help the complaints and reports processing which, in turn, leads to a swifter action by authorities in charge. Thus, relying on text classification of reports and complaints can have a positive influence in either economic crime prevention or in public health, in this case, by means of food-related inspections.

[1]  Heiko Paulheim,et al.  Semantic Web in data mining and knowledge discovery: A comprehensive survey , 2016, J. Web Semant..

[2]  Peng Wang,et al.  Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification , 2016, Neurocomputing.

[3]  Dunja Mladenic,et al.  A Rule based Approach to Word Lemmatization , 2004 .

[4]  RistoskiPetar,et al.  Semantic Web in data mining and knowledge discovery , 2016 .

[5]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[6]  Abhisek Mudgal,et al.  Vehicle Consumer Complaint Reports Involving Severe Incidents: Mining Large Contingency Tables , 2018, Transportation Research Record: Journal of the Transportation Research Board.

[7]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[8]  Yaakov HaCohen-Kerner,et al.  Automatic classification of complaint letters according to service provider categories , 2019, Inf. Process. Manag..

[9]  Philip K. Chan,et al.  Systems for Knowledge Discovery in Databases , 1993, IEEE Trans. Knowl. Data Eng..

[10]  Virginijus Marcinkevičius,et al.  Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification , 2017, Balt. J. Mod. Comput..

[11]  Efstathios Stamatatos,et al.  Learning to recognize webpage genres , 2009, Inf. Process. Manag..

[12]  Boris Galitsky,et al.  An Anatomy of a Lie: , 2019, WWW.

[13]  D. S. Guru,et al.  A Novel Term Weighting Scheme and an Approach for Classification of Agricultural Arabic Text Complaints , 2018, 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR).

[14]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[15]  Kwangsoo Kim,et al.  Customer Complaints Analysis Using Text Mining and Outcome-Driven Innovation Method for Market-Oriented Product Development , 2018, Sustainability.

[16]  Tong Guo,et al.  Mining and Analyzing User Feedback from App Reviews: An Econometric Approach , 2018, 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[17]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[18]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[19]  Indranil Bose,et al.  What do hotel customers complain about? Text analysis using structural topic model , 2019, Tourism Management.

[20]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[21]  Lucio Soibelman,et al.  Data Preparation Process for Construction Knowledge Generation through Knowledge Discovery in Databases , 2002 .

[22]  Engin Zeydan,et al.  A Customer Complaint Analysis Tool for Mobile Network Operators , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[23]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[24]  Bin Wu,et al.  A Complaint Text Classification Model Based on Character-Level Convolutional Network , 2018, 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS).

[25]  Alain Abran,et al.  A systematic literature review: Opinion mining studies from mobile app store user reviews , 2017, J. Syst. Softw..

[26]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.