Semi-Automatic Data Annotation guided by Feature Space Projection

Abstract Data annotation using visual inspection (supervision) of each training sample can be laborious. Interactive solutions alleviate this by helping experts propagate labels from a few supervised samples to unlabeled ones based solely on the visual analysis of their feature space projection (with no further sample supervision). We present a semi-automatic data annotation approach based on suitable feature space projection and semi-supervised label estimation. We validate our method on the popular MNIST dataset and on images of human intestinal parasites with and without fecal impurities, a large and diverse dataset that makes classification very hard. We evaluate two approaches for semi-supervised learning from the latent and projection spaces, to choose the one that best reduces user annotation effort and also increases classification accuracy on unseen data. Our results demonstrate the added-value of visual analytics tools that combine complementary abilities of humans and machines for more effective machine learning.

[1]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[2]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[3]  Alexandre X. Falcão,et al.  Modeling normal brain asymmetry in MR images applied to anomaly detection without segmentation and data annotation , 2019, Medical Imaging.

[4]  João Paulo Papa,et al.  Automated diagnosis of human intestinal parasites using optical microscopy images , 2013, 2013 IEEE 10th International Symposium on Biomedical Imaging.

[5]  Jefersson Alex dos Santos,et al.  Incorporating multiple distance spaces in optimum-path forest classification to improve feedback-based learning , 2012, Comput. Vis. Image Underst..

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Marco Hutter,et al.  Comparing Visual-Interactive Labeling with Active Learning: An Experimental Study , 2018, IEEE Transactions on Visualization and Computer Graphics.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[10]  João Paulo Papa,et al.  Efficient supervised optimum-path forest classification for large datasets , 2012, Pattern Recognit..

[11]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[12]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[13]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Germain Forestier,et al.  Semi-supervised learning using multiple clusterings with limited labeled data , 2016, Inf. Sci..

[15]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[16]  Luis Gustavo Nonato,et al.  Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment , 2019, IEEE Transactions on Visualization and Computer Graphics.

[17]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[18]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[19]  Alexandre X. Falcão,et al.  Delaunay Triangulation Data Augmentation Guided by Visual Analytics for Deep Learning , 2018, 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).

[20]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[21]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[22]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[23]  Alexandre X. Falcão,et al.  Semi-Supervised Learning with Interactive Label Propagation Guided by Feature Space Projections , 2018, 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).

[24]  João Paulo Papa,et al.  Improving semi-supervised learning through optimum connectivity , 2016, Pattern Recognit..

[25]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[26]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[30]  Paulo E. Rauber,et al.  Visualizing Time-Dependent Data Using Dynamic t-SNE , 2016, EuroVis.

[31]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[32]  John M. Pecarina,et al.  Improved Aircraft Recognition for Aerial Refueling Through Data Augmentation in Convolutional Neural Networks , 2016, ISVC.

[33]  Hongyu Guo,et al.  Exemplar-centered Supervised Shallow Parametric Data Embedding , 2017, IJCAI.

[34]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[35]  Lorenzo Bruzzone,et al.  A Batch-Mode Active Learning Technique Based on Multiple Uncertainty for SVM Classifier , 2012, IEEE Geoscience and Remote Sensing Letters.

[36]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[37]  Paulo E. Rauber,et al.  Projections as visual aids for classification system design , 2017, Inf. Vis..

[38]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[39]  Alexandre X. Falcão,et al.  Links Between Image Segmentation Based on Optimum-Path Forest and Minimum Cut in Graph , 2009, Journal of Mathematical Imaging and Vision.

[40]  Alexandre X. Falcão,et al.  Intelligent Understanding of User Interaction in Image Segmentation , 2012, Int. J. Pattern Recognit. Artif. Intell..