Taskonomy: Disentangling Task Transfer Learning

Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity. We proposes a fully computational approach for modeling the structure of space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. We provide a set of tools for computing and probing this taxonomical structure including a solver users can employ to find supervision policies for their use cases.

[1]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[2]  Erin Sullivan Mind , 2010, The Lancet.

[3]  Andrea Vedaldi,et al.  Integrated perception with recurrent multi-task neural networks , 2016, NIPS.

[4]  Daniel L. Silver,et al.  Guest editor’s introduction: special issue on inductive transfer learning , 2008, Machine Learning.

[5]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[7]  M. M. Hassan Mahmud,et al.  On universal transfer learning , 2007, Theor. Comput. Sci..

[8]  Michal Irani,et al.  Similarity by Composition , 2006, NIPS.

[9]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[10]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[11]  Trevor Darrell,et al.  Continuous Manifold Based Adaptation for Evolving Visual Domains , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.

[13]  J. Tenenbaum,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Theory-based Bayesian models of inductive learning and reasoning , 2022 .

[14]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[15]  Elie Bienenstock,et al.  Compositionality, MDL Priors, and Object Recognition , 1996, NIPS.

[16]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Joachim Bingel,et al.  Identifying beneficial task relations for multi-task learning in deep neural networks , 2017, EACL.

[18]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[19]  Aditya Bhaskara,et al.  Provable Bounds for Learning Some Deep Representations , 2013, ICML.

[20]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[21]  Joshua B. Tenenbaum,et al.  One-Shot Learning with a Hierarchical Nonparametric Bayesian Model , 2011, ICML Unsupervised and Transfer Learning.

[22]  Rong Yan,et al.  Adapting SVM Classifiers to Data with Shifted Distributions , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[23]  Maks Ovsjanikov,et al.  Functional maps , 2012, ACM Trans. Graph..

[24]  Leslie G. Ungerleider,et al.  Curvature-processing network in macaque visual cortex , 2014, Proceedings of the National Academy of Sciences.

[25]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[26]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[27]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[28]  Edward H. Adelson,et al.  The perception of shading and reflectance , 1996 .

[29]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[32]  N. Schaumberger Generalization , 1989, Whitehead and Philosophy of Education.

[33]  Ondrej Miksik Rapid vanishing point estimation for general road detection , 2012, 2012 IEEE International Conference on Robotics and Automation.

[34]  Jitendra Malik,et al.  Generic 3D Representation via Pose Estimation and Matching , 2016, ECCV.

[35]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[36]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[37]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Michael Johnson Compositionality , 2020, The Wiley Blackwell Companion to Semantics.

[40]  Jitendra Malik,et al.  The three R's of computer vision: Recognition, reconstruction and reorganization , 2016, Pattern Recognit. Lett..

[41]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[43]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[44]  W. Marsden I and J , 2012 .

[45]  Alan L. Yuille,et al.  The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference , 2000, NIPS.

[46]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[47]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[48]  Sergey Levine,et al.  Generalizing Skills with Semi-Supervised Reinforcement Learning , 2016, ICLR.

[49]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[50]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[51]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[52]  J. Richards,et al.  On the nature of the visual-cliff-avoidance response in human infants. , 1980, Child development.

[53]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[54]  G. Carpenter,et al.  Behavioral and Brain Sciences , 1999 .

[55]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[56]  C. Vidal,et al.  STAT , 2019, Springer Reference Medizin.

[57]  Pascal Vasseur,et al.  Globally optimal line clustering and vanishing point estimation in Manhattan world , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[59]  Rong Ge,et al.  Provable algorithms for machine learning problems , 2013 .

[60]  Qiang Yang,et al.  Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[61]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[62]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[63]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[64]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[65]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[66]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[68]  Paolo Favaro,et al.  Representation Learning by Learning to Count , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[69]  Hamid Izadinia,et al.  IM2CAD , 2016, 1608.05137.

[70]  A. Gopnik,et al.  The scientist in the crib : minds, brains, and how children learn , 1999 .

[71]  Abhinav Gupta,et al.  Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[72]  Christoph H. Lampert,et al.  Multi-task Learning with Labeled and Unlabeled Tasks , 2016, ICML.

[73]  R. Lathe Phd by thesis , 1988, Nature.

[74]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[75]  Mason A. Porter,et al.  Random walks and diffusion on networks , 2016, ArXiv.

[76]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[77]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[78]  Jean Ponce,et al.  Vanishing point detection for road detection , 2009, CVPR.

[79]  Lu Wang,et al.  Wide-baseline image matching using Line Signatures , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[80]  David M. Sobel,et al.  A theory of causal learning in children: causal maps and Bayes nets. , 2004, Psychological review.

[81]  J. Piaget,et al.  The Origins of Intelligence in Children , 1971 .

[82]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Supun Samarasekera,et al.  Ten-fold Improvement in Visual Odometry Using Landmark Matching , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[85]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[86]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[87]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[88]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[89]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[90]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[91]  R. Held,et al.  MOVEMENT-PRODUCED STIMULATION IN THE DEVELOPMENT OF VISUALLY GUIDED BEHAVIOR. , 1963, Journal of comparative and physiological psychology.

[92]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Patrick J. Roa Volume 8 , 2001 .

[94]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[95]  Dong Liu,et al.  Robust visual domain adaptation with low-rank reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[96]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[97]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[98]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[99]  R. W. Saaty,et al.  The analytic hierarchy process—what it is and how it is used , 1987 .

[100]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[101]  Shai Ben-David,et al.  A notion of task relatedness yielding provable multiple-task learning guarantees , 2008, Machine Learning.

[102]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[103]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[105]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[106]  R. Horaud,et al.  Surface feature detection and description with applications to mesh matching , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[107]  Stephen Lin,et al.  Semantic colorization with internet images , 2011, ACM Trans. Graph..

[108]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[109]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[110]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[111]  Shmuel Peleg,et al.  Visual Learning of Arithmetic Operation , 2015, AAAI.

[112]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[113]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[114]  Mohammed Bennamoun,et al.  On the Repeatability and Quality of Keypoints for Local Feature-based 3D Object Retrieval from Cluttered Scenes , 2009, International Journal of Computer Vision.

[115]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[116]  Terry Winograd,et al.  Thinking Machines: Can There Be? Are We? , 1990, Informatica.

[117]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[118]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[119]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[120]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[121]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[122]  Fei-Fei Li,et al.  Label Efficient Learning of Transferable Representations acrosss Domains and Tasks , 2017, NIPS.

[123]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[124]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[125]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[126]  Rich Caruana,et al.  Inductive Transfer for Bayesian Network Structure Learning , 2007, ICML Unsupervised and Transfer Learning.

[127]  M. Biot,et al.  QUARTERLY OF APPLIED MATHEMATICS , 1972 .

[128]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[129]  Jitendra Malik,et al.  Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies , 2018 .

[130]  Guosheng Lin,et al.  CRF Learning with CNN Features for Image Segmentation , 2015, Pattern Recognit..

[131]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[132]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[133]  Joshua B Tenenbaum,et al.  Toward the neural implementation of structure learning , 2016, Current Opinion in Neurobiology.

[134]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[135]  Yu Zhong,et al.  Intrinsic shape signatures: A shape descriptor for 3D object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[136]  Kevin J. Henry The Theory and Applications of Homomorphic Cryptography , 2008 .

[137]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[138]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[139]  Reinhard Koch,et al.  Vanishing Point Estimation and Line Classification in a Manhattan World with a Unifying Camera Model , 2016, International Journal of Computer Vision.

[140]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[141]  Abhinav Gupta,et al.  Transitive Invariance for Self-Supervised Visual Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[142]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[143]  Ian D. Reid,et al.  Locally Planar Patch Features for Real-Time Structure from Motion , 2004, BMVC.

[144]  R. French Catastrophic Forgetting in Connectionist Networks: Causes, Consequences and Solutions , 1999 .

[145]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[146]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[147]  Michal Irani,et al.  “Clustering by Composition”—Unsupervised Discovery of Image Categories , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.