论文信息 - NEWS CLASSIFICATION WITH HUMAN ANNOTATORS: A CASE STUDY

NEWS CLASSIFICATION WITH HUMAN ANNOTATORS: A CASE STUDY

The need to classify textual documents has become an increasingly vibrant research field due to the development of online news. While most of the news in news website s are categorised manually, the task becomes more strenuous considering the tremendous surge of data update s every day. This paper addresses the question of how text classification algorithms can substitute the particular task over manual classification methods . A combined method using Bracewell's algorithm and top-n method is demonstrated and tested using Indonesian language corpus. The experiment also uses human evaluation as the benchmark. The result from the human evaluation is further investigated in order to understand how the annotators classify documents and the aspects that can be improved to enhance the method in the future. The results indicate that the method can outperform human annotators by 13% in terms of accuracy .

Jafreezal Jaafar | Norshuhani Zamin | Aini Fuddoly

[1] Christine D. Piatko,et al. Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[2] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[4] Bernardete Ribeiro,et al. Learning from multiple annotators: Distinguishing good from random labelers , 2013, Pattern Recognit. Lett..

[5] Plaban Kumar Bhowmick,et al. Classifying Emotion in News Sentences: When Machine Classification Meets Human Classification , 2010 .

[6] Claire Cardie,et al. Automatically Generating Annotator Rationales to Improve Sentiment Classification , 2010, ACL.

[7] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[8] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[9] Shingo Kuroiwa,et al. Category Classification and Topic Discovery of Japanese and English News Articles , 2006, MFCSIT.

[10] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[11] Charu C. Aggarwal,et al. A Survey of Text Classification Algorithms , 2012, Mining Text Data.