Extraction of Bibliography Information Based on Image of Book Cover

This paper describes a new system for extracting and classifying bibliography regions from the color image of a book cover. The system consists of three major components: preprocessing, color space segmentation and text region extraction and classification. Preprocessing extracts the edge lines of the book and geometrically corrects and segments the input image, into the parts of front cover, spine and back cover. The same as all color image processing researches, the segmentation of color space is an essential and important step here. Instead of RGB color space, HSI color space is used in this system. The color space is segmented into achromatic and chromatic regions first; and both the achromatic and chromatic regions are segmented further to complete the color space segmentation. Then text region extraction and classification follow. After detecting fundamental features (stroke width and local label width) text regions are determined. By comparing the text regions on front cover with those on spine, all extracted text regions are classified into suitable bibliography categories: author, title, publisher and other information, without applying OCR.

[1]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Hirotomo Aso,et al.  Robust and fast text-line extraction using local linearity of the text-line , 1995, Systems and Computers in Japan.

[3]  A. Peter Johnson,et al.  A Fast Algorithm for Bottom-Up Document Layout Analysis , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Arthur Robert Weeks,et al.  Edge detection of color images using the HSL color space , 1995, Electronic Imaging.

[5]  B. GATOS,et al.  Skew detection and text line position determination in digitized documents , 1997, Pattern Recognit..

[6]  Ioannis Pitas,et al.  Multichannel techniques in color image enhancement and modeling , 1996, IEEE Trans. Image Process..

[7]  Sargur N. Srihari,et al.  Document Image Binarization Based on Texture Features , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Haruo Asada,et al.  Major components of a complete text reading system , 1992 .

[9]  Nikolaos G. Bourbakis,et al.  A fuzzy region growing approach for segmentation of color images , 1997, Pattern Recognit..

[10]  Masaki Yamaoka,et al.  A functional classification approach to layout analysis of document images , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[11]  H. John Durrett,et al.  Color and the computer , 1987 .

[12]  Shyang Chang,et al.  A new criterion for automatic multilevel thresholding , 1995, IEEE Trans. Image Process..

[13]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[14]  Sargur N. Srihari,et al.  Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..