Text Region Extraction for Noisy Spam Image

In this paper, the problem of spam filtering for images, a type of fast-spreading spam where the text is included in images to overcome the text-based spam filter. One common method for detecting spam is the optical character recognition system (OCR) that detecting and recognizing the text embedded, following by a classifier which distinguishes spam from ham. Nevertheless, the spammers begin hiding image text for preventing OCR from detecting spam. To recompense for the shortages of the OCR system, a method based on the detection algorithm is proposed for the text region. To estimate the performance of the projected system, the methodology was applied to a group of unwanted images Dredze (available to the public) to check the efficiency of our method which outperforms the initial OCR system in sensible use with a complex background in spam. The test results indicated that the new method gives good text regions detection even for noisy images.