Image mining of textual images using low-level image features

The mining of images from several categories is a problem arisen naturally under a wide range of circumstances. Image mining concerns with extraction of image data relationships, or other patterns of images which are not explicitly stored in the images. And Image classification is a large and growing field within image processing. Image Classification is useful in CBIR (Content Based Image Retrieval).There are many type of images that can be classified according to their nature, content or domain. In this paper, we present a novel unsupervised method for the image classification based on various feature's distribution of textual images. From these various features, differences between images can be computed, and these can be used to classify the textual images which are of three types i.e. Document image, Caption Text image or Scene Text image. Based on various low level features like mean, skewness, energy, contrast, homogeneity, we can classify various textual images. In first level of classification, image is converted into gray scale image then histogram features like mean variance and skewness are extracted and using weka J48 decision tree classifier, images are classified as Doc and Non-Doc image. In second level of classification, we slice gray scale image in binary form. From that GLCM (Gray Level Co-occurrence Matrix) features are classified. GLCM feature as Energy, Entropy, Contrast, Homogeneity are used to classify Non-Doc images. We have experimented on 60 images of different types.

[1]  Lionel Prevost,et al.  Texture based Text Detection in Natural Scene Images - A Help to Blind and Visually Impaired Persons , 2007, CVHI.

[2]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[3]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[4]  S. Chitrakala,et al.  Multi-class Enhanced Image Mining of Heterogeneous Textual Images Using Multiple Image Features , 2009, 2009 IEEE International Advance Computing Conference.

[5]  Joo-Hwee Lim,et al.  Unsupervised learning for image classification based on distribution of hierarchical feature tree , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[6]  Shih-Fu Chang,et al.  Exploring Text and Image Features to Classify Images in Bioscience Literature , 2006, BioNLP@NAACL-HLT.

[7]  Bertrand Le Saux,et al.  Image Classifiers for Scene Analysis , 2004, ICCVG.

[8]  Bram van Ginneken,et al.  Image Classification from Generalized Image Distance Features: Application to Detection of Interstitial Disease in Chest Radiographs , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[9]  Chabane Djeraba,et al.  International Workshop on Knowledge Discovery in Multimedia and Complex Data (KDMCD 2002), in conjunction with the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-02), Taipei, Taiwan, May 6-8 , 2002, KDMCD.

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..