Inspector gadget

As machine learning for images becomes democratized in the Software 2.0 era, one of the serious bottlenecks is securing enough labeled data for training. This problem is especially critical in a manufacturing setting where smart factories rely on machine learning for product quality control by analyzing industrial images. Such images are typically large and may only need to be partially analyzed where only a small portion is problematic (e.g., identifying defects on a surface). Since manual labeling these images is expensive, weak supervision is an attractive alternative where the idea is to generate weak labels that are not perfect, but can be produced at scale. Data programming is a recent paradigm in this category where it uses human knowledge in the form of labeling functions and combines them into a generative model. Data programming has been successful in applications based on text or structured data and can also be applied to images usually if one can find a way to convert them into structured data. In this work, we expand the horizon of data programming by directly applying it to images without this conversion, which is a common scenario for industrial applications. We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification. We perform experiments on real industrial image datasets and show that Inspector Gadget obtains better performance than other weak-labeling techniques: Snuba, GOGGLES, and self-learning baselines using convolutional neural networks (CNNs) without pre-training.

[1]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Hayit Greenspan,et al.  GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification , 2018, Neurocomputing.

[3]  M. Anand “1984” , 1962 .

[4]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[7]  Jeffrey F. Naughton,et al.  Corleone: hands-off crowdsourcing for entity matching , 2014, SIGMOD Conference.

[8]  Tomas E. Ward,et al.  Generative Adversarial Networks in Computer Vision , 2019, ACM Comput. Surv..

[9]  Qinggang Meng,et al.  An End-to-End Steel Surface Defect Detection Approach via Fusing Multiple Hierarchical Features , 2020, IEEE Transactions on Instrumentation and Measurement.

[10]  Thang D. Bui,et al.  Neural Graph Learning: Training Neural Networks Using Graphs , 2018, WSDM.

[11]  Christopher Ré,et al.  Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale , 2018, SIGMOD Conference.

[12]  Christopher Ré,et al.  Snuba: Automating Weak Supervision to Label Training Data , 2018, Proc. VLDB Endow..

[13]  Tomas E. Ward,et al.  Generative Adversarial Networks: A Survey and Taxonomy , 2019, ArXiv.

[14]  Ercan Öztemel,et al.  Literature review of Industry 4.0 and related technologies , 2018, J. Intell. Manuf..

[15]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[17]  Michael S. Bernstein,et al.  Flock: Hybrid Crowd-Machine Learning Classifiers , 2015, CSCW.

[18]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaoyong Du,et al.  CrowdGame: A Game-Based Crowdsourcing System for Cost-Effective Data Labeling , 2019, SIGMOD Conference.

[20]  Eugene Wu,et al.  CLAMShell: Speeding up Crowds for Low-latency Data Labeling , 2015, Proc. VLDB Endow..

[21]  Aditya G. Parameswaran,et al.  Accelerating Human-in-the-loop Machine Learning: Challenges and Opportunities , 2018, DEEM@SIGMOD.

[22]  Alexia Jolicoeur-Martineau,et al.  The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.

[23]  Daniel L. Rubin,et al.  Inferring Generative Model Structure with Static Analysis , 2017, NIPS.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Edward H. Adelson,et al.  PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .

[26]  Vineeth N. Balasubramanian,et al.  Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Steven Euijong Whang,et al.  A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective , 2018, IEEE Transactions on Knowledge and Data Engineering.

[28]  Jure Skvarč,et al.  Segmentation-based deep-learning approach for surface-defect detection , 2019, Journal of Intelligent Manufacturing.

[29]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[30]  Purnamrita Sarkar,et al.  Active Learning for Crowd-Sourced Databases , 2012, ArXiv.

[31]  Christopher Ré,et al.  Snorkel: Fast Training Set Generation for Information Extraction , 2017, SIGMOD Conference.

[32]  Yu He,et al.  Semi-supervised defect classification of steel surface based on multi-training and generative adversarial network , 2019, Optics and Lasers in Engineering.

[33]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[35]  George E. Dahl,et al.  Artificial Intelligence-Based Breast Cancer Nodal Metastasis Detection: Insights Into the Black Box for Pathologists. , 2018, Archives of pathology & laboratory medicine.

[36]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[37]  Michael Stonebraker,et al.  Machine Learning and Big Data: What is Important? , 2019, IEEE Data Eng. Bull..

[38]  Frederic Sala,et al.  Learning Dependency Structures for Weak Supervision Models , 2019, ICML.

[39]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[40]  Yunhui Yan,et al.  A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects , 2013 .

[41]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[42]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[43]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.