A CBR System for Image-Based Webpage Classification: Case Representation with Convolutional Neural Networks

During the past decade, there was an exponential growth in the number of webpages available. Automatic webpage categorization systems can help to manage these immense amounts of content, making search tasks and recommendation easier. However, most webpages have a significant proportion of visual content that conventional, text-based web mining systems can not handle. In this paper, we present a novel hybrid CBR framework designed to perform imagebased webpage categorization. Our system incorporates stateof-the-art deep learning techniques which help attain high accuracy rates. In addition, the system was designed with the goal of minimizing computational costs.

[1]  Isabelle Bichindaritz,et al.  Automatic semantic indexing of medical images using a web ontology language for case-based image retrieval , 2009, Eng. Appl. Artif. Intell..

[2]  George I. Mihalas,et al.  Content Based Image Retrieval Using Local Binary Pattern Operator and Data Mining Techniques , 2015, MIE.

[3]  Leonardo Oliveira,et al.  Virtual Reality as an Environment for CBR , 1998, EWCBR.

[4]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[6]  Cord Spreckelsen,et al.  Towards case-based medical learning in radiological decision making using content-based image retrieval , 2011, BMC Medical Informatics Decis. Mak..

[7]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[8]  Michael M. Richter,et al.  Case-Based Reasoning: A Textbook , 2013 .

[9]  Sara Nasiri,et al.  A Medical Case-Based Reasoning Approach Using Image Classification and Text Information for Recommendation , 2015, IWANN.

[10]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[13]  Klaus Hechenbichler,et al.  Weighted k-Nearest-Neighbor Techniques and Ordinal Classification , 2004 .