Harmful comments extraction from a Bulletin Board System using word meaning and impression on thread context

Harmful documents make readers unpleasant on the Web. In order to hide the harmful documents from the public, machine learning methods have been proposed, which learn words used in harmful documents and hide them automatically. The learned words often have bad meanings. Though word meanings are not changed, word impression may be changed on context. Even if a word with bad impression is contained in a document, the previous learning methods can not learn the word, and fail to hide documents. We select the following approach: word impression may be changed on context. If a word has been used with other words of good meaning, it is considered that impression of the word is also good. In contrast, if a word has been used with others of bad meaning, impression of the word may be bad. This paper proposes a new extraction method of harmful comments in a thread of a Bulletin Board System. The proposed method extracts comments using word meanings and word impression on thread context. We evaluated the proposed method using comments collected from four threads in Japanese BBS "2-channel." The averaged precision of extraction was 0.47, and the averaged recall was 0.68. We verified that the proposed method was suitable for extraction of harmful comments from a thread of a BBS.