How do Convolutional Neural Networks Learn Design?

In this paper, we aim to understand the design principles in book cover images which are carefully crafted by experts. Book covers are designed in a unique way, specific to genres which convey important information to their readers. By using Convolutional Neural Networks (CNN) to predict book genres from cover images, visual cues which distinguish genres can be highlighted and analyzed. In order to understand these visual clues contributing towards the decision of a genre, we present the application of Layer-wise Relevance Propagation (LRP) on the book cover image classification results. We use LRP to explain the pixel-wise contributions of book cover design and highlight the design elements contributing towards particular genres. In addition, with the use of state-of-the-art object and text detection methods, insights about genre-specific book cover designs are discovered.

[1]  Klaus-Robert Müller,et al.  Interpretable human action recognition in compressed domain , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Klaus-Robert Müller,et al.  Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.

[8]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[10]  Roxzanne van Eyk,et al.  Judging a book by its cover , 2008 .

[11]  Bonnie L. Webber,et al.  Squibs: Stable Classification of Text Genres , 2011, CL.

[12]  Yi Li,et al.  Convolutional Neural Networks for Document Image Classification , 2014, 2014 22nd International Conference on Pattern Recognition.

[13]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[14]  Wei-Ta Chu,et al.  Movie Genre Classification based on Poster Images with Deep Neural Networks , 2017, MUSA2@MM.

[15]  Vineeth N. Balasubramanian,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Seiichi Uchida,et al.  A Further Step to Perfect Accuracy by Training CNN with Larger Data , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[17]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[18]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Konstantinos G. Derpanis,et al.  Evaluation of deep convolutional nets for document image classification and retrieval , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[20]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[21]  Aidan Finn,et al.  Learning to classify documents according to genre , 2006, J. Assoc. Inf. Sci. Technol..

[22]  Alexander Binder,et al.  The LRP Toolbox for Artificial Neural Networks , 2016, J. Mach. Learn. Res..

[23]  M. Bassok,et al.  Judging a book by its cover: Interpretative effects of content on problem-solving transfer , 1995, Memory & cognition.

[24]  Klaus-Robert Müller,et al.  Identifying Individual Facial Expressions by Deconstructing a Neural Network , 2016, GCPR.

[25]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Bryan Pardo,et al.  Classifying paintings by artistic genre: An analysis of features & classifiers , 2009, 2009 IEEE International Workshop on Multimedia Signal Processing.

[28]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[29]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Klaus-Robert Müller,et al.  "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.

[31]  Ichiro Fujinaga,et al.  Automatic Genre Classification Using Large High-Level Musical Feature Sets , 2004, ISMIR.

[32]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[33]  Marcus Liwicki,et al.  Deepdocclassifier: Document classification with deep Convolutional Neural Network , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[34]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[35]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.