Text Extraction for Spam-Mail Image Filtering Using a Text Color Estimation Technique

In this paper, we propose an algorithm for extracting text regions from images in spam-mails. The Color Layer-Based Text Extraction(CLTE).It extracts connected components on the eight planes, and then classifies them into either text regions or non-text. We also propose an algorithm to recover damaged text strokes in Korean text images. There are two types of damaged strokes: (1) middle strokes such as '???' or '--' are deleted, and (2) the first and last strokes such as '???' or '???' are filled with black pixels. An experiment with 200 spammail images shows that the proposed approach is more accurate than conventional methods by over 10%.