Functionality-Based Web Image Categorization

The World Wide Web provides an increasingly powerful and popular publication mechanism. Web documents often contain a large number of images serving various different purposes. Identifying the functional categories of these images has important applications including information extraction, web mining, web page summarization and mobile access. In this paper we outline a novel algorithm for automatic identification of two of the most important image categories, namely story and preview images.

[1]  Daniel P. Lopresti,et al.  Locating and Recognizing Text in WWW Images , 2000, Information Retrieval.

[2]  Tapas Kanungo What Fraction of Images on the Web Contain Text ? , 2001 .

[3]  Jianying Hu,et al.  Flexible Web document analysis for delivery to narrow-bandwidth devices , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[5]  Jianying Hu,et al.  Categorizing images in Web documents , 2003, IEEE MultiMedia.

[6]  Robert M. Gray,et al.  Text and picture segmentation by the distribution analysis of wavelet coefficients , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  Bernd Girod,et al.  Classification of compound images based on transform coefficient likelihood , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[9]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[10]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[11]  A. Gupta,et al.  Text segmentation in mixed-mode images , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[12]  Andreas Girgensohn,et al.  Web Page Filtering and Re-Authoring for Mobile Users , 1999, Comput. J..

[13]  Apostolos Antonacopoulos,et al.  An Anthropocentric Approach to Text Extraction from WWW Images , 2000 .

[14]  Ethan V. Munson To Search for Images on the Web , Look at the Text , Then Look at the Images , 2001 .

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[18]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[19]  Yueting Zhuang,et al.  OCTOPUS: aggressive search of multi-modality data using multifaceted knowledge base , 2002, WWW '02.

[20]  C. R. Henson Conclusion , 1969 .