A Utility-Based Web Content Sensitivity Mining Approach
暂无分享,去创建一个
Abnormal remarks on World Wide Web, such as violence, threat, superstition, etc. may disturb the social order and public morality. Most traditional methods filter a page as long as it contains a keyword in a predefined blacklist. Such methods cannot provide a quantitative measure of how sensitive the content is. In this paper, we propose a utility-based Web content sensitivity mining approach. Utility is viewed as the measure of how sensitive a page is. It allows the Internet regulators to take different operations according to different sensitivity values. We apply our approach on a real-world Web dataset. It identified a number of sensitive Web pages that traditional frequency-based methods failed to find. By varying the sensitive values of the keywords, different sets of high sensitivity keywords were discovered.
[1] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.
[2] Ying Liu,et al. A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.
[3] Michael L. Littman,et al. Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.