Self-Supervised Learning for Specified Latent Representation

Current latent representation methods using unsupervised learning have no semantic meaning; thus, it is difficult to directly express their physical task in the real world. To this end, this paper attempts to propose a specified latent representation with physical semantic meaning. First, a few labeled samples are used to generate the framework of the latent space, and these labeled samples are mapped to framework nodes in the latent space. Second, a self-learning method using structured unlabeled samples is proposed to shape the free space between the framework nodes in the latent space. The proposed specified latent representation therefore possesses the advantages provided by both supervised and unsupervised learning. The proposed method is verified by numerical simulations and real-world experiments.

[1]  Rama Chellappa,et al.  Growing Regression Forests by Classification: Applications to Object Pose Estimation , 2013, ECCV.

[2]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[5]  Ken Chen,et al.  Non-vector space visual servoing for multiple pin-in-hole assembly by robot , 2016, 2016 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO).

[6]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[7]  Geoffrey E. Hinton,et al.  Generative versus discriminative training of RBMs for classification of fMRI images , 2008, NIPS.

[8]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[9]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[10]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[11]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[12]  Ke Chen,et al.  Hierarchical Sliding Slice Regression for Vehicle Viewing Angle Estimation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[13]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[16]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[17]  Jörn Ostermann,et al.  Embedding Geometry in Generative Models for Pose Estimation of Object Categories , 2014, BMVC.

[18]  Rama Chellappa,et al.  Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation , 2017, ArXiv.

[19]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[20]  Tao Mei,et al.  Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure , 2016, IJCAI.

[21]  Yoshua Bengio,et al.  Neural net language models , 2008, Scholarpedia.

[22]  Yoshua Bengio,et al.  Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[23]  Kun He,et al.  Parameterizing Object Detectors in the Continuous Pose Space , 2014, ECCV.

[24]  Justus H. Piater,et al.  Multiview feature distributions for object detection and continuous pose estimation , 2014, Comput. Vis. Image Underst..

[25]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[26]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[27]  Yoshua Bengio,et al.  Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.

[28]  Franziska Meier,et al.  SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control , 2017, ArXiv.

[29]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[30]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[31]  Xiaoou Tang,et al.  Object Detection and Viewpoint Estimation with Auto-masking Neural Network , 2014, ECCV.

[32]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[33]  Rama Chellappa,et al.  Growing Regression Tree Forests by Classification for Continuous Object Pose Estimation , 2017, International Journal of Computer Vision.

[34]  Yoshua Bengio,et al.  Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[35]  Hossein Mobahi,et al.  Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.

[36]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bodo Rosenhahn,et al.  Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[40]  Tinne Tuytelaars,et al.  All together now: Simultaneous Detection and Continuous Pose Estimation using a Hough Forest with Probabilistic Locally Enhanced Voting , 2014, BMVC.

[41]  Ahmed M. Elgammal,et al.  Regression from local features for viewpoint and pose estimation , 2011, 2011 International Conference on Computer Vision.

[42]  Martin A. Riedmiller,et al.  PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations , 2017, ArXiv.

[43]  M. Georgiopoulos,et al.  Feed-forward neural networks , 1994, IEEE Potentials.

[44]  Geoffrey E. Hinton,et al.  Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.

[45]  Jordi Pont-Tuset,et al.  The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[46]  Jürgen Leitner,et al.  Visual Servoing from Deep Neural Networks , 2017, RSS 2017.

[47]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[48]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[49]  Ahmed M. Elgammal,et al.  Joint Object and Pose Recognition Using Homeomorphic Manifold Analysis , 2013, AAAI.

[50]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[51]  Kristen Grauman,et al.  Object-Centric Representation Learning from Unlabeled Videos , 2016, ACCV.

[52]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[53]  Jörn Ostermann,et al.  Continuous Pose Estimation with a Spatial Ensemble of Fisher Regressors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).