Preventing juveniles from accessing pornographic web pages remains a problem in Vietnam. The existing tools have failed to block these Vietnamese sites automatically and rely only on configuring black list and white list. In fact, the Vietnamese and English are different in both syntax and semantic, therefore, applying methods used for English into Vietnamese will definitely be much less effective. In this article, we present an automatic system that employs Naïve Bayes machine learning method to filter pornographic web pages in Vietnamese without impacting to users and system performance. The system contains 2 components: The classifier is researched and installed in server, and extension is integrated on the end user's browser. The entire system will be managed from 2 sides: server-side and client-side to gain the best result. In this article, our prime concern is to focus only on text, not image.
[1]
Andrew McCallum,et al.
A comparison of event models for naive bayes text classification
,
1998,
AAAI 1998.
[2]
Igor Santos,et al.
Adult Content Filtering through Compression-Based Text Classification
,
2012,
CISIS/ICEUTE/SOCO Special Sessions.
[3]
Vu Duc Lung,et al.
Bayesian spam filtering for Vietnamese emails
,
2012,
2012 International Conference on Computer & Information Science (ICCIS).
[4]
William A. Gale,et al.
A sequential algorithm for training text classifiers
,
1994,
SIGIR '94.
[5]
Gerald Salton,et al.
Automatic text processing
,
1988
.
[6]
Jiawei Han,et al.
Data Mining: Concepts and Techniques
,
2000
.
[7]
S. C. Hui,et al.
Neural Networks for Web Content Filtering
,
2002,
IEEE Intell. Syst..