Text Detection in Natural Images by Connected Component Analysis

Text in a digital image contains important cue to the scene understanding and can be useful for many applications. Detecting and extracting such text is a difficult task. The main problem in extracting text from natural images is caused by several reasons including font size variation, alignment of text and variation of font colors. In this paper, we propose a connected component based method to automatically detect the text region from natural images. Since text regions in mages contain mostly repetition of vertical strokes, we try to find a pattern of closely packed vertical edges. Once the group of edges is found, neighboring vertical edges are connected to each other. Connected regions with geometric features out of valid specifications are considered as outliers and eliminated. The proposed method is effective for slanted or curved characters compared to existing methods. Experimental results are given for the validation of our approach.