论文信息 - Learning from the Tangram to Solve Mini Visual Tasks

Learning from the Tangram to Solve Mini Visual Tasks

Current pre-training methods in computer vision focus on natural images in the daily-life context. However, abstract diagrams such as icons and symbols are common and important in the real world. This work is inspired by Tangram, a game that requires replicating an abstract pattern from seven dissected shapes. By recording human experience in solving tangram puzzles, we present the Tangram dataset and show that a pre-trained neural model on the Tangram helps solve some mini visual tasks based on low-resolution vision. Extensive experiments demonstrate that our proposed method generates intelligent solutions for aesthetic tasks such as folding clothes and evaluating room layouts. The pre-trained feature extractor can facilitate the convergence of few-shot learning tasks on human handwriting and improve the accuracy in identifying icons by their contours. The Tangram dataset is available at https://github.com/yizhouzhao/Tangram.

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Dong-Ming Yan,et al. Generating and exploring good building layouts , 2013, ACM Trans. Graph..

[3] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4] Siniša Šegvić,et al. In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Guangtao Zhai,et al. Perceptual image quality assessment: a survey , 2020, Science China Information Sciences.

[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Iyad Rahwan,et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[8] Suman K. Mitra,et al. Image Aesthetics Assessment using Multi Channel Convolutional Neural Networks , 2019, CVIP.

[9] Xiaogang Wang,et al. Content-based photo quality assessment , 2011, 2011 International Conference on Computer Vision.

[10] Diego Gutierrez,et al. Learning icons appearance similarity , 2018, Multimedia Tools and Applications.

[11] Xiaoou Tang,et al. Image Aesthetic Assessment: An experimental survey , 2016, IEEE Signal Processing Magazine.

[12] Keiji Yanai,et al. Iconify: Converting Photographs into Icons , 2020, MMArt&ACM@ICMR.

[13] Masashi Nishiyama,et al. Aesthetic quality classification of photographs based on color harmony , 2011, CVPR 2011.

[14] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[15] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.

[16] James Zijun Wang,et al. Rating Image Aesthetics Using Deep Learning , 2015, IEEE Transactions on Multimedia.

[17] Chenfanfu Jiang,et al. Human-Centric Indoor Scene Synthesis Using Stochastic Grammar , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Kai-Kuang Ma,et al. ESIM: Edge Similarity for Screen Content Image Quality Assessment , 2017, IEEE Transactions on Image Processing.

[19] L. Deng,et al. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[20] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[21] Samy Bengio,et al. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[22] Yi Li,et al. Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.

[23] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[24] Joshua B. Tenenbaum,et al. The Omniglot challenge: a 3-year progress report , 2019, Current Opinion in Behavioral Sciences.

[25] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[26] John K Haas,et al. A History of the Unity Game Engine , 2014 .

[27] Greg Mori,et al. House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation , 2020, ECCV.

[28] Frédo Durand,et al. Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics , 2018, ArXiv.

[29] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[30] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[31] Yan Ke,et al. The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32] Yingli Tian,et al. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] M. Wiener,et al. Animal eyes. , 1957, The American orthoptic journal.

[34] Tae-Kyun Kim,et al. Autonomous active recognition and unfolding of clothes using random decision forests and probabilistic planning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[35] Song-Chun Zhu,et al. Robot learning with a spatial, temporal, and causal and-or graph , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36] Gregory Cohen,et al. EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[37] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[38] Sylvain Lefebvre,et al. Game level layout from design specification , 2014, Comput. Graph. Forum.

[39] Kai Wang,et al. Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Guangtao Zhai,et al. MC360IQA: A Multi-channel CNN for Blind 360-Degree Image Quality Assessment , 2020, IEEE Journal of Selected Topics in Signal Processing.

[41] Michael Freeman,et al. The Complete Guide to Light & Lighting in Digital Photography (A Lark Photography Book) , 2006 .

[42] Chi-Keung Tang,et al. Make it home: automatic optimization of furniture arrangement , 2011, ACM Trans. Graph..

[43] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[44] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[45] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46] Vincent Gripon,et al. Leveraging the Feature Distribution in Transfer-based Few-Shot Learning , 2020, ICANN.

[47] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[48] Carme Torras,et al. Dynamic Cloth Manipulation with Deep Reinforcement Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[49] Alexandru Iosup,et al. Procedural content generation for games: A survey , 2013, TOMCCAP.

[50] Sayak Paul,et al. G-SimCLR: Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling , 2020, 2020 International Conference on Data Mining Workshops (ICDMW).

[51] Jiebo Luo,et al. Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[52] Taiji Suzuki,et al. Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).