Deep Generalized Max Pooling

Global pooling layers are an essential part of Convolutional Neural Networks (CNN). They are used to aggregate activations of spatial locations to produce a fixed-size vector in several state-of-the-art CNNs. Global average pooling or global max pooling are commonly used for converting convolutional features of variable size images to a fix-sized embedding. However, both pooling layer types are computed spatially independent: each individual activation map is pooled and thus activations of different locations are pooled together. In contrast, we propose Deep Generalized Max Pooling that balances the contribution of all activations of a spatially coherent region by re-weighting all descriptors so that the impact of frequent and rare ones is equalized. We show that this layer is superior to both average and max pooling on the classification of Latin medieval manuscripts (CLAMM'16, CLAMM'17), as well as writer identification (Historical-WI'17).

[1]  Tang Youbao,et al.  Text-Independent Writer Identification via CNN Features and Joint Bayesian , 2016 .

[2]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[4]  Basilios Gatos,et al.  ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[5]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Van Cuong Kieu,et al.  ICDAR2017 Competition on the Classification of Medieval Handwritings in Latin Script , 2016, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[7]  Cloppet Florence,et al.  ICFHR2016 Competition on the Classification of Medieval Handwritings in Latin Script , 2016 .

[8]  Sudholt Sebastian,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016 .

[9]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[10]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[11]  Andreas K. Maier,et al.  Encoding CNN Activations for Writer Recognition , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[12]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Chin-Teng Lin,et al.  Semi-supervised feature learning for improving writer identification , 2019, Inf. Sci..

[16]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Robert Sablatnig,et al.  Writer Identification and Retrieval Using a Convolutional Neural Network , 2015, CAIP.

[18]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Vincent Christlein Handwriting Analysis with Focus on Writer Identification and Writer Retrieval , 2019 .

[20]  Naila Murray,et al.  Generalized Max Pooling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[24]  Naila Murray,et al.  Interferences in Match Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[26]  Robert Sablatnig,et al.  Learning Features for Writer Retrieval and Identification using Triplet CNNs , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[31]  Andreas K. Maier,et al.  Offline Writer Identification Using Convolutional Neural Network Activation Features , 2015, GCPR.

[32]  Andreas K. Maier,et al.  Unsupervised Feature Learning for Writer Identification and Writer Retrieval , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[33]  Chris Tensmeyer,et al.  Convolutional Neural Networks for Font Classification , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[34]  Dominique Stutzmann,et al.  Clustering of medieval scripts through computer image analysis: Towards an evaluation protocol , 2016 .

[35]  Mike Kestemont,et al.  Artificial Paleography: Computational Approaches to Identifying Script Types in Medieval Manuscripts , 2017, Speculum.

[36]  Lambert Schomaker,et al.  Deep Adaptive Learning for Writer Identification based on Single Handwritten Word Images , 2018, Pattern Recognit..

[37]  Andreas K. Maier,et al.  Precision Learning: Towards Use of Known Operators in Neural Networks , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).