Abstract Pornography on social media raises a lot of negative impact and affect the moral of children and teenagers. Social media used to spread pornography can have a negative impact. Thus, the spread of pornography on social media must be prevented. One of the social media which is often used as a medium pornography is Twitter. Pornography used on Twitter in the form of text and image. Among the two types of media, the text is very interesting to study because of the use of a variety of languages. In this study, the classification process will be conducted in Indonesian and English tweet and a combination of both languages. This classification uses three methods of machine learning, Decision Tree, Naive Bayes and Support Vector Machines for the purpose of comparing which method is the best in the classification process. In this study also conducted additional experiment was carried out with the aim of improving the performance in classification. The results showed that the level of accuracy is quite high. However, different grammar is a constraint that affects the accuracy of the results in the classification.
[1]
José Manuel Perea Ortega,et al.
Sentiment analysis system adaptation for multilingual processing: The case of tweets
,
2015,
Inf. Process. Manag..
[2]
Donato Malerba,et al.
A Comparative Analysis of Methods for Pruning Decision Trees
,
1997,
IEEE Trans. Pattern Anal. Mach. Intell..
[3]
Nello Cristianini,et al.
An introduction to Support Vector Machines
,
2000
.
[4]
Thorsten Joachims,et al.
Text Categorization with Support Vector Machines: Learning with Many Relevant Features
,
1998,
ECML.
[5]
Jantima Polpinij,et al.
A web pornography patrol system by content-based analysis: In particular text and image
,
2008,
2008 IEEE International Conference on Systems, Man and Cybernetics.
[6]
Hugh E. Williams,et al.
Stemming Indonesian
,
2005,
ACSC.
[7]
Chih-Jen Lin,et al.
A Practical Guide to Support Vector Classication
,
2008
.