Text-based Image Segmentation Methodology

Abstract In computer vision, segmentation is the process of partitioning a digital image into multiple segments (sets of pixels). Image segmentation is thus inevitable. Segmentation used for text-based images aim in retrieval of specific information from the entire image. This information can be a line or a word or even a character. This paper proposes various methodologies to segment a text based image at various levels of segmentation. This material serves as a guide and update for readers working on the text based segmentation area of Computer Vision. First, the need for segmentation is justified in the context of text based information retrieval. Then, the various factors affecting the segmentation process are discussed. Followed by the levels of text segmentation are explored. Finally, the available techniques with their superiorities and weaknesses are reviewed, along with directions for quick referral are suggested. Special attention is given to the handwriting recognition since this area requires more advanced techniques for efficient information extraction and to reach the ultimate goal of machine simulation of human reading.

[1]  R. Manmatha,et al.  A scale space approach for automatically segmenting words from historical handwritten documents , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[3]  Paolo Nesi,et al.  Projection based segmentation of musical sheets , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[4]  George D. C. Cavalcanti,et al.  Text Line Segmentation Based on Morphology and Histogram Projection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[5]  Azriel Rosenfeld,et al.  A method of detecting the orientation of aligned components , 1986, Pattern Recognit. Lett..

[6]  Jerry L Prince,et al.  Current methods in medical image segmentation. , 2000, Annual review of biomedical engineering.

[7]  Linda G. Shapiro,et al.  Computer Vision , 2001 .

[8]  S. Basavaraj Patil Neural Network based Bilingual OCR System: Experiment with English and Kannada Bilingual Documents , 2011 .

[9]  Shravani Krishna Rau,et al.  Off-line Handwritten Kannada Text Recognition using Support Vector Machine using Zernike Moments , 2011 .

[10]  Nafiz Arica,et al.  An overview of character recognition focused on off-line handwriting , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[11]  Sargur N. Srihari,et al.  Interpretation of handwritten addresses in US mailstream , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Zhu Xiaoyan,et al.  A new algorithm for handwritten character recognition , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[13]  Luiz Eduardo Soares de Oliveira,et al.  An OCR Free Method for Word Spotting in Printed Documents: the Evaluation of Different Feature Sets , 2011, J. Univers. Comput. Sci..

[14]  Fatos T. Yarman-Vural,et al.  Repulsive attractive network for baseline extraction on document images , 1999, Signal Process..

[15]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[16]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[17]  Darko Brodic,et al.  A New Approach to Water Flow Algorithm for Text Line Segmentation , 2011, J. Univers. Comput. Sci..

[18]  G. Shobha,et al.  Character segmentation algorithms for Kannada optical character recognition , 2008, 2008 International Conference on Wavelet Analysis and Pattern Recognition.

[19]  P. S. Sastry,et al.  A font and size-independent OCR system for printed Kannada documents using support vector machines , 2002 .

[20]  Chirag I. Patel,et al.  Handwritten Character Recognition using Neural Network , 2011 .

[21]  R. D. Sudhaker Samuel,et al.  A simple and efficient optical character recognition system for basic symbols in printed Kannada text , 2007 .

[22]  Rosli Salleh,et al.  A Real-time Line Segmentation Algorithm for an Offline Overlapped Handwritten Jawi Character Recognition Chip , 2007 .

[23]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..