Recently, it has become easy for the growing Goverment 2.0 movement to report complaints. On the other hand, there is a clearly identified and growing delay in responses from the government side due to an overload on government capacity to deal with the increasing number of complaint reports as the movement grows. In this paper, we propose a method of automatically categorizing complaint reports as a first step to reduce the pressure on the government side. We conducted experiments in categorizing the complaint reports. The experimental results showed the following findings: (1) Feature selection is key to improving the accuracy (F-score) of the categorization of complaint reports. The percentage of words that are strongly effective for categorization is about 3.9% of the total of distinct words. (2) Proposed Mutual-Information(MI)-based methods outperform a conventional Random-Forest(RF)-based method. (3) The City management section seems to classify complaint reports by focusing on demands expressed in the reports. (4) The categorization performance usually high if training data includes various types of categories of data.
[1]
Leo Breiman,et al.
Bagging Predictors
,
1996,
Machine Learning.
[2]
Yoav Freund,et al.
Experiments with a New Boosting Algorithm
,
1996,
ICML.
[3]
Zizi Papacharissi.
Without You, I'm Nothing: Performances of the Self on Twitter
,
2012
.
[4]
Vladimir N. Vapnik,et al.
The Nature of Statistical Learning Theory
,
2000,
Statistics for Engineering and Information Science.
[5]
Peter E. Hart,et al.
Nearest neighbor pattern classification
,
1967,
IEEE Trans. Inf. Theory.
[6]
Robert Matthews,et al.
Neural Computation in Stylometry I: An Application to the Works of Shakespeare and Fletcher
,
1993
.
[7]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..
[8]
Leo Breiman,et al.
Random Forests
,
2001,
Machine Learning.