Classifying images on the web automatically

Numerous research works about the extraction of low-level features from images and videos have been published. However, only recently the focus has shifted to exploiting low-level features to classify images and videos automatically into semantically broad and meaningful categories. In this paper, novel classification algorithms are presented for three broad and general-purpose categories. In detail, we present algorithms for distinguishing photo-like images from graphical images, actual photos from only photo-like, but artificial images and presentation slides/scientific posters from comics. On a large image database, our classification algorithm achieved an accuracy of 97.69% in separating photo-like images from graphical images. In the subset of photo-like images, true photos could be separated from ray-traced/rendered image with an accuracy of 97.3%, while with an accuracy of 99.5% the subset of graphical images was successfully partitioned into presentation slides/scientific posters and comics.

[1]  Thomas S. Huang,et al.  Image processing , 1971 .

[2]  Farzin Mokhtarian,et al.  Scale-Based Description and Recognition of Planar Curves and Two-Dimensional Shapes , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Rosalind W. Picard,et al.  Texture orientation for sorting photos "at a glance" , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[5]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Jorma Laaksonen,et al.  LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[8]  Elaine C. Yiu Image classification using color cues and texture orientation , 1996 .

[9]  C. Frankel,et al.  Distinguishing photographs and graphics on the World Wide Web , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[10]  Anil K. Jain,et al.  Content-based hierarchical classification of vacation images , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[11]  Aditya Vailaya,et al.  Semantic classification in image databases , 2000 .

[12]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[13]  Ben Bradshaw,et al.  Semantic based image retrieval: a probabilistic approach , 2000, ACM Multimedia.

[14]  Rainer Lienhart,et al.  Automatic classification of images on the Web , 2001, IS&T/SPIE Electronic Imaging.

[15]  Raimondo Schettini,et al.  Content-based Classification of Digital Documents , 2001, PRIS.

[16]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[17]  Raimondo Schettini,et al.  A hierarchical classification strategy for digital documents , 2002, Pattern Recognit..