Performance Comparison on Automated Generation of Coding Rules: A Case Study on ISO 26000

When texts are mined for meaningful information, one important aspect is to construct a coding rule that categorizes key terms into several conceptual groups. Usually such a rule is human-made and tends to be subjective. The present study attempts to build coding rules automatically from the ISO 26000 document by using two proposed methods. The results were compared with the manually created coding rules, and the SVM method was proven to be more effective.

[1]  Satoru Uchida,et al.  Automated Generation of Coding Rules: Text-Mining Approach to ISO 26000 , 2016, 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI).

[2]  Douglas W. Oard,et al.  Thematic Analysis of Words that Invoke Values in the Net Neutrality Debate , 2015 .

[3]  Douglas W. Oard,et al.  Automatic Dictionary Extraction and Content Analysis Associated with Human Values , 2015 .

[4]  Michael Scharkow,et al.  Thematic content analysis using supervised machine learning: An empirical evaluation using German online news , 2011, Quality & Quantity.

[5]  Fernando De la Torre,et al.  Optimal feature selection for support vector machines , 2010, Pattern Recognit..

[6]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[7]  Chih-Jen Lin,et al.  Feature Ranking Using Linear SVM , 2008, WCCI Causation and Prediction Challenge.

[8]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[9]  Heiner Stuckenschmidt,et al.  Multidimensional topic analysis in political texts , 2014, Data Knowl. Eng..

[10]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[11]  Marko Grobelnik,et al.  Feature Selection Using Support Vector Machines , 2002 .

[12]  Kevin Crowston,et al.  Semi-Automatic Content Analysis of Qualitative Data , 2014 .

[13]  Yuen-Hsien Tseng,et al.  Trends of Science Education Research: An Automatic Content Analysis , 2010 .