论文信息 - Exploiting Contextual Information with Deep Neural Networks

Exploiting Contextual Information with Deep Neural Networks

Context matters! Nevertheless, there has not been much research in exploiting contextual information in deep neural networks. For most part, the entire usage of contextual information has been limited to recurrent neural networks. Attention models and capsule networks are two recent ways of introducing contextual information in non-recurrent models, however both of these algorithms have been developed after this work has started. In this thesis, we show that contextual information can be exploited in 2 fundamentally different ways: implicitly and explicitly. In the DeepScore project, where the usage of context is very important for the recognition of many tiny objects, we show that by carefully crafting convolutional architectures, we can achieve state-of-the-art results, while also being able to implicitly correctly distinguish between objects which are virtually identical, but have different meanings based on their surrounding. In parallel, we show that by explicitly designing algorithms (motivated from graph theory and game theory) that take into considerations the entire structure of the dataset, we can achieve state-of-the-art results in different topics like semi-supervised learning and similarity learning. To the best of our knowledge, we are the first to integrate graph-theoretical modules, carefully crafted for the problem of similarity learning and that are designed to consider contextual information, not only outperforming the other models, but also gaining a speed improvement while using a smaller number of parameters.

Ismail Elezi | Ismail Elezi

[1] K. Pearson. VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[2] Steven W. Zucker,et al. Copositive-plus Lemke algorithm solves polymatrix games , 1991, Oper. Res. Lett..

[3] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[5] Marcello Pelillo,et al. Context aware nonnegative matrix factorization clustering , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[6] Kai Zhao,et al. RegularFace: Deep Face Recognition via Exclusive Regularization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[8] José Manuel Iñesta Quereda,et al. Two (Note) Heads Are Better Than One: Pen-Based Multimodal Interaction with Music Scores , 2016, ISMIR.

[9] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[10] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[13] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Yaroslav Bulatov,et al. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[15] Jürgen Schmidhuber,et al. Neural Expectation Maximization , 2017, NIPS.

[16] Alán Aspuru-Guzik,et al. Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[17] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[18] Giorgio Valentini,et al. Protein function prediction as a graph-transduction game , 2020, Pattern Recognit. Lett..

[19] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[20] Timothy C. Bell,et al. The Challenge of Optical Music Recognition , 2001, Comput. Humanit..

[21] Alexander J. Smola,et al. Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Marco Fiorucci,et al. Revealing structure in large graphs: Szemerédi's regularity lemma and its use in pattern recognition , 2016, Pattern Recognit. Lett..

[23] Friedhelm Schwenker,et al. Three learning phases for radial-basis-function networks , 2001, Neural Networks.

[24] Rongrong Ji,et al. Towards Optimal Fine Grained Retrieval via Decorrelated Centralized Loss with Normalize-Scale Layer , 2019, AAAI.

[25] Daben Liu,et al. Online speaker clustering , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Pietro Perona,et al. The Ignorant Led by the Blind: A Hybrid Human–Machine Vision System for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[28] Dhruv Batra,et al. Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Yair Movshovitz-Attias,et al. No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30] Feng Zhou,et al. Embedding Label Structures for Fine-Grained Feature Representation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[32] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[33] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34] Ling Shao,et al. Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[36] Serge Beucher,et al. THE WATERSHED TRANSFORMATION APPLIED TO IMAGE SEGMENTATION , 2009 .

[37] Carlos Guedes,et al. Optical music recognition: state-of-the-art and open issues , 2012, International Journal of Multimedia Information Retrieval.

[38] Alessandro Sperduti,et al. A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[39] Stan Sclaroff,et al. Deep Metric Learning to Rank , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Alicia Fornés,et al. CVC-MUSCIMA: a ground truth of handwritten music score images for writer identification and staff removal , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[41] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Stefanie Jegelka,et al. Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Marcello Pelillo,et al. The Dynamics of Nonlinear Relaxation Labeling Processes , 1997, Journal of Mathematical Imaging and Vision.

[44] Marcello Pelillo,et al. Constrained dominant sets for retrieval , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[45] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[46] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[47] Pavel Pecina,et al. In Search of a Dataset for Handwritten Optical Music Recognition: Introducing MUSCIMA++ , 2017, ArXiv.

[48] Ha Yoon Song,et al. Daily Life Mobility of a Student: From Position Data to Human Mobility Model through Expectation Maximization Clustering , 2011, FGIT-MulGraB.

[49] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[50] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[51] Marcello Pelillo,et al. DeepScores and Deep Watershed Detection: current state and open issues , 2018, ArXiv.

[52] Samuel S. Schoenholz,et al. Neural Message Passing for Quantum Chemistry , 2017, ICML.

[53] Benjamin Bruno Meier,et al. Learning Neural Models for End-to-End Clustering , 2018, ANNPR.

[54] Pavel Pecina,et al. The MUSCIMA++ Dataset for Handwritten Optical Music Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[55] Marcello Pelillo,et al. Transductive Label Augmentation for Improved Deep Network Learning , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[56] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[57] Ronald Rosenfeld,et al. Semi-supervised learning with graphs , 2005 .

[58] Marcello Pelillo,et al. Multi-feature Fusion for Image Retrieval Using Constrained Dominant Sets , 2018, Image Vis. Comput..

[59] Hao-Yu Wu,et al. Classification is a Strong Baseline for Deep Metric Learning , 2018, BMVC.

[60] Kuo-Chin Fan,et al. A Novel Spectral Clustering Method Based on Pairwise Distance Matrix , 2010, J. Inf. Sci. Eng..

[61] Azriel Rosenfeld,et al. Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[62] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63] Björn Ommer,et al. Divide and Conquer the Embedding Space for Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[65] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[66] B. Schölkopf,et al. A Regularization Framework for Learning from Graph Data , 2004, ICML 2004.

[67] Geoffrey E. Hinton,et al. Training Recurrent Neural Networks , 2013 .

[68] Ohad Ben-Shahar,et al. SceneNet: A Perceptual Ontology for Scene Understanding , 2014, ECCV Workshops.

[69] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[70] Luís C. Lamb,et al. Typed Graph Networks , 2019, ArXiv.

[71] José Oncina,et al. Recognition of Pen-Based Music Notation: The HOMUS Dataset , 2014, 2014 22nd International Conference on Pattern Recognition.

[72] Fionn Murtagh,et al. A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[73] Zsolt Kira,et al. Learning to cluster in order to Transfer across domains and tasks , 2017, ICLR.

[74] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[75] Xiang Bai,et al. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76] Isabelle Bloch,et al. Robust and Adaptive OMR System Including Fuzzy Modeling, Fusion of Musical Rules, and Possible Error Detection , 2007, EURASIP J. Adv. Signal Process..

[77] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[78] Marcello Pelillo,et al. Dominant Sets for “Constrained” Image Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79] Ismail Elezi,et al. CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[80] Zoubin Ghahramani,et al. Learning from labeled and unlabeled data with label propagation , 2002 .

[81] Jorge Calvo-Zaragoza,et al. End-to-End Optical Music Recognition Using Neural Networks , 2017, ISMIR.

[82] Laura Leal-Taix'e,et al. Learning a Neural Solver for Multiple Object Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[83] Bertrand Coüasnon,et al. Bootstrapping Samples of Accidentals in Dense Piano Scores for CNN-Based Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[84] Keechul Jung,et al. GPU implementation of neural networks , 2004, Pattern Recognit..

[85] Joan Bruna,et al. Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[86] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[87] Horst Possegger,et al. BIER — Boosting Independent Embeddings Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[88] Jiebo Luo,et al. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[89] Chao Zhang,et al. Hard-Aware Deeply Cascaded Embedding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[90] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[91] Qi Qian,et al. SoftTriple Loss: Deep Metric Learning Without Triplet Sampling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[92] Jürgen Schmidhuber,et al. Deep Watershed Detector for Music Object Recognition , 2018, ISMIR.

[93] Arindam Banerjee,et al. Semi-supervised Clustering by Seeding , 2002, ICML.

[94] Horst Possegger,et al. Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95] Kesheng Wu,et al. Optimizing two-pass connected-component labeling algorithms , 2009, Pattern Analysis and Applications.

[96] Yan Lu,et al. Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[97] Jürgen Schmidhuber,et al. DeepScores-A Dataset for Segmentation, Detection and Classification of Tiny Objects , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[98] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .

[99] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .

[100] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[101] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.

[102] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[103] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[104] Antonio Torralba,et al. Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[105] Jaime S. Cardoso,et al. Optical recognition of music symbols - A comparative study , 2010, Int. J. Document Anal. Recognit..

[106] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[107] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[108] Marcello Pelillo,et al. The Group Loss for Deep Metric Learning , 2019, ECCV.

[109] Benjamin Bruno Meier,et al. Deep Learning in the Wild , 2018, ANNPR.

[110] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[111] Cheng Deng,et al. Deep Asymmetric Metric Learning via Rich Relationship Mining , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[112] Kunihiko Fukushima,et al. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[113] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[114] Weilin Huang,et al. Deep Metric Learning with Hierarchical Triplet Loss , 2018, ECCV.

[115] Daniel Cremers,et al. Clustering with Deep Learning: Taxonomy and New Methods , 2018, ArXiv.

[116] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[117] D. M. V. Hesteren. Evolutionary Game Theory , 2017 .

[118] Stefan Winkler,et al. A data-driven approach to cleaning large face datasets , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[119] Daben Liu,et al. Online speaker clustering , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[120] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[121] Dacheng Tao,et al. Correcting the Triplet Selection Bias for Triplet Loss , 2018, ECCV.

[122] Michael Kampffmeyer,et al. Deep divergence-based clustering , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[123] J. Dunning. The elephant in the room. , 2013, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[124] Robert Pless,et al. Deep Randomized Ensembles for Metric Learning , 2018, ECCV.

[125] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[126] Yan Lu,et al. Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[127] Yang Hua,et al. Ranked List Loss for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[128] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[129] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[130] Johannes Stallkamp,et al. The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[131] Ery Arias-Castro,et al. Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions , 2009, IEEE Transactions on Information Theory.

[132] Bernhard Schölkopf,et al. Learning with Local and Global Consistency , 2003, NIPS.

[133] Jörgen W. Weibull,et al. Evolutionary Game Theory , 1996 .

[134] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[135] Yafang Xue,et al. Optical Character Recognition , 2022 .

[136] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[137] Karen Ullrich,et al. Optical Music Recognition with Convolutional Sequence-to-Sequence Models , 2017, ISMIR.

[138] Pavel Pecina,et al. Detecting Noteheads in Handwritten Scores with ConvNets and Bounding Box Regression , 2017, ArXiv.

[139] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[140] Ali Farhadi,et al. Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[141] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[142] Fei Yin,et al. CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[143] Kun He,et al. Hashing as Tie-Aware Learning to Rank , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[144] Min Bai,et al. Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[145] Simon Osindero,et al. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[146] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[147] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[148] Derek Greene,et al. Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[149] Kihyuk Sohn,et al. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[150] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[151] Dong-Hyun Lee,et al. Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[152] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[153] J M Smith,et al. Evolution and the theory of games , 1976 .

[154] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[155] Geoffrey E. Hinton,et al. Visualizing non-metric similarities in multiple maps , 2011, Machine Learning.

[156] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[157] Vinay P. Namboodiri,et al. Deep active learning for object detection , 2018, BMVC.

[158] Simon Dixon,et al. An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[159] Jon Almazán,et al. Learning With Average Precision: Training Image Retrieval With a Listwise Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[160] Daniel Cremers,et al. Learning by Association — A Versatile Semi-Supervised Training Method for Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[161] Jian Wang,et al. Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[162] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[163] Matthew R. Scott,et al. Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[164] Silvio Savarese,et al. Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[165] Alicia Fornés,et al. Towards the Recognition of Compound Music Notes in Handwritten Music Scores , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[166] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[167] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[168] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[169] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[170] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[171] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[172] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[173] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[174] Nir Ailon,et al. Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[175] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[176] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[177] Mohamed Chtourou,et al. On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[178] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[179] Mubarak Shah,et al. Deep Constrained Dominant Sets for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[180] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[181] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[182] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[183] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[184] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[185] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[186] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[187] Colin Raffel,et al. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[188] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[189] Eric van Damme,et al. Non-Cooperative Games , 2000 .

[190] Jan Hajic,et al. A Baseline for General Music Object Detection with Deep Learning , 2018, Applied Sciences.

[191] S. Linnainmaa. Taylor expansion of the accumulated rounding error , 1976 .

[192] Jorge Calvo-Zaragoza,et al. Staff-line removal with selectional auto-encoders , 2017, Expert Syst. Appl..

[193] Thorsten Joachims,et al. Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[194] David Berthelot,et al. MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[195] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[196] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[197] Sameer A. Nene,et al. Columbia Object Image Library (COIL100) , 1996 .

[198] Horst M. Eidenberger,et al. Towards Self-Learning Optical Music Recognition , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[199] Lei Zhang,et al. Towards Human-Machine Cooperation: Self-Supervised Sample Mining for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[200] Steven W. Zucker,et al. On the Foundations of Relaxation Labeling Processes , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[201] Tengyu Ma,et al. On the Ability of Neural Nets to Express Distributions , 2017, COLT.

[202] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[203] Horst M. Eidenberger,et al. Handwritten Music Object Detection: Open Issues and Baseline Results , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[204] Raquel Urtasun,et al. Deep Spectral Clustering Learning , 2017, ICML.

[205] Jiwen Lu,et al. Deep Embedding Learning With Discriminative Sampling Policy , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[206] Jungmin Lee,et al. Attention-based Ensemble for Deep Metric Learning , 2018, ECCV.

[207] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.

[208] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[209] Aykut Erdem,et al. Graph Transduction as a Noncooperative Game , 2012, Neural Computation.

[210] Timnit Gebru,et al. Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[211] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[212] Oliver Durr,et al. Learning embeddings for speaker clustering based on voice equality , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[213] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[214] J. Schmidhuber,et al. Neural Networks for Segmenting Neuronal Structures in EM Stacks , 2012 .

[215] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[216] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[217] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[218] Pietro Perona,et al. Self-Tuning Spectral Clustering , 2004, NIPS.