Learning from the Tangram to Solve Mini Visual Tasks

Current pre-training methods in computer vision focus on natural images in the daily-life context. However, abstract diagrams such as icons and symbols are common and important in the real world. This work is inspired by Tangram, a game that requires replicating an abstract pattern from seven dissected shapes. By recording human experience in solving tangram puzzles, we present the Tangram dataset and show that a pre-trained neural model on the Tangram helps solve some mini visual tasks based on low-resolution vision. Extensive experiments demonstrate that our proposed method generates intelligent solutions for aesthetic tasks such as folding clothes and evaluating room layouts. The pre-trained feature extractor can facilitate the convergence of few-shot learning tasks on human handwriting and improve the accuracy in identifying icons by their contours. The Tangram dataset is available at https://github.com/yizhouzhao/Tangram.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Dong-Ming Yan,et al.  Generating and exploring good building layouts , 2013, ACM Trans. Graph..

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Siniša Šegvić,et al.  In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Guangtao Zhai,et al.  Perceptual image quality assessment: a survey , 2020, Science China Information Sciences.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[8]  Suman K. Mitra,et al.  Image Aesthetics Assessment using Multi Channel Convolutional Neural Networks , 2019, CVIP.

[9]  Xiaogang Wang,et al.  Content-based photo quality assessment , 2011, 2011 International Conference on Computer Vision.

[10]  Diego Gutierrez,et al.  Learning icons appearance similarity , 2018, Multimedia Tools and Applications.

[11]  Xiaoou Tang,et al.  Image Aesthetic Assessment: An experimental survey , 2016, IEEE Signal Processing Magazine.

[12]  Keiji Yanai,et al.  Iconify: Converting Photographs into Icons , 2020, MMArt&ACM@ICMR.

[13]  Masashi Nishiyama,et al.  Aesthetic quality classification of photographs based on color harmony , 2011, CVPR 2011.

[14]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[15]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[16]  James Zijun Wang,et al.  Rating Image Aesthetics Using Deep Learning , 2015, IEEE Transactions on Multimedia.

[17]  Chenfanfu Jiang,et al.  Human-Centric Indoor Scene Synthesis Using Stochastic Grammar , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Kai-Kuang Ma,et al.  ESIM: Edge Similarity for Screen Content Image Quality Assessment , 2017, IEEE Transactions on Image Processing.

[19]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[20]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[21]  Samy Bengio,et al.  Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[22]  Yi Li,et al.  Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.

[23]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[24]  Joshua B. Tenenbaum,et al.  The Omniglot challenge: a 3-year progress report , 2019, Current Opinion in Behavioral Sciences.

[25]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[26]  John K Haas,et al.  A History of the Unity Game Engine , 2014 .

[27]  Greg Mori,et al.  House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation , 2020, ECCV.

[28]  Frédo Durand,et al.  Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics , 2018, ArXiv.

[29]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[30]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[31]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  M. Wiener,et al.  Animal eyes. , 1957, The American orthoptic journal.

[34]  Tae-Kyun Kim,et al.  Autonomous active recognition and unfolding of clothes using random decision forests and probabilistic planning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Song-Chun Zhu,et al.  Robot learning with a spatial, temporal, and causal and-or graph , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[37]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[38]  Sylvain Lefebvre,et al.  Game level layout from design specification , 2014, Comput. Graph. Forum.

[39]  Kai Wang,et al.  Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Guangtao Zhai,et al.  MC360IQA: A Multi-channel CNN for Blind 360-Degree Image Quality Assessment , 2020, IEEE Journal of Selected Topics in Signal Processing.

[41]  Michael Freeman,et al.  The Complete Guide to Light & Lighting in Digital Photography (A Lark Photography Book) , 2006 .

[42]  Chi-Keung Tang,et al.  Make it home: automatic optimization of furniture arrangement , 2011, ACM Trans. Graph..

[43]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[44]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[45]  Kaiming He,et al.  Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Vincent Gripon,et al.  Leveraging the Feature Distribution in Transfer-based Few-Shot Learning , 2020, ICANN.

[47]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[48]  Carme Torras,et al.  Dynamic Cloth Manipulation with Deep Reinforcement Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Alexandru Iosup,et al.  Procedural content generation for games: A survey , 2013, TOMCCAP.

[50]  Sayak Paul,et al.  G-SimCLR: Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling , 2020, 2020 International Conference on Data Mining Workshops (ICDMW).

[51]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[52]  Taiji Suzuki,et al.  Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).