Domain adaptation with low-rank alignment for weakly supervised hand pose recovery

Abstract Human hand pose recovery (HPR) in depth images is usually conducted by constructing mappings between 2D depth images and 3D hand poses. It is a challenging task since the feature spaces of 2D images and 3D poses are different. Therefore, a large number of labeled data is required for training, especially for popular frameworks such as deep learning. In this paper, we propose an HPR method with weak supervision. It is based on neural network and domain adaptation is introduced to enhance the trained model. To achieve domain adaptation, we propose low-rank alignment, which aligns the testing samples to the distribution of labeled samples. In this process, autoencoders are used to extract 2D image features and low-rank representation is used to describe this feature space. Therefore, the proposed method is named as Domain Adaptation with Low-Rank Alignment (DALA). In this way, we obtain a robust and non-linear mapping from 2D images to 3D poses. Experiments are conducted on two challenging benchmark datasets MSRA and ICVL. Both the results on a single dataset and across datasets show the outstanding performance of DALA.

[1]  Jianping Fan,et al.  iPrivacy: Image Privacy Protection by Identifying Sensitive Objects via Deep Multi-Task Learning , 2017, IEEE Transactions on Information Forensics and Security.

[2]  Jun Yu,et al.  Exploiting Click Constraints and Multi-view Features for Image Re-ranking , 2014, IEEE Transactions on Multimedia.

[3]  Sung-Jea Ko,et al.  Depth Sensation Enhancement Using the Just Noticeable Depth Difference , 2012, IEEE Transactions on Image Processing.

[4]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[5]  Yoshua Bengio,et al.  Marginalized Denoising Auto-encoders for Nonlinear Representations , 2014, ICML.

[6]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[7]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Qing Tian,et al.  Cross-heterogeneous-database age estimation through correlation representation learning , 2017, Neurocomputing.

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[12]  Y. Wang,et al.  Large-scale paralleled sparse principal component analysis , 2014, Multimedia Tools and Applications.

[13]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Xinhui Tu,et al.  Cross-domain sentiment classification via topical correspondence transfer , 2015, Neurocomputing.

[15]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[16]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[17]  Xinpeng Zhang,et al.  Kernel quaternion principal component analysis and its application in RGB-D object recognition , 2017, Neurocomputing.

[18]  William R. Provancher,et al.  Mental Rotation of Tactile Stimuli: Using Directional Haptic Cues in Mobile Devices , 2013, IEEE Transactions on Haptics.

[19]  Ke Lu,et al.  $p$-Laplacian Regularized Sparse Coding for Human Activity Recognition , 2016, IEEE Transactions on Industrial Electronics.

[20]  Xuegang Hu,et al.  Domain adaptation via Multi-Layer Transfer Learning , 2016, Neurocomputing.

[21]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[22]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[23]  Mahdieh Soleymani Baghshah,et al.  Unsupervised domain adaptation via representation learning and adaptive classifier learning , 2015, Neurocomputing.

[24]  Ke Lu,et al.  Multiview Hessian regularized logistic regression for action recognition , 2015, Signal Process..

[25]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[26]  Ming C. Lin,et al.  Motion planning and autonomy for virtual humans , 2008, SIGGRAPH '08.

[27]  Yong Luo,et al.  Large Margin Multi-Modal Multi-Task Feature Extraction for Image Classification , 2019, IEEE Transactions on Image Processing.

[28]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[29]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[30]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Yuan Yan Tang,et al.  High-Order Distance-Based Multiview Stochastic Learning in Image Classification , 2014, IEEE Transactions on Cybernetics.

[32]  Harry Shum,et al.  Real-Time Bayesian 3-D Pose Tracking , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Jun Yu,et al.  Machine learning and signal processing for big multimedia analysis , 2017, Neurocomputing.

[34]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[37]  Shitong Wang,et al.  Sparsity regularization label propagation for domain adaptation learning , 2014, Neurocomputing.

[38]  Wei-Shi Zheng,et al.  Learning Person–Person Interaction in Collective Activity Recognition , 2015, IEEE Transactions on Image Processing.

[39]  Lale Akarun,et al.  Real Time Hand Pose Estimation Using Depth Sensors , 2013, Consumer Depth Cameras for Computer Vision.

[40]  Yong Luo,et al.  Decomposition-Based Transfer Distance Metric Learning for Image Classification , 2014, IEEE Transactions on Image Processing.

[41]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[42]  Chang Wang,et al.  Heterogeneous Domain Adaptation Using Manifold Alignment , 2011, IJCAI.

[43]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[44]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[45]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[47]  Chang Wang,et al.  A General Framework for Manifold Alignment , 2009, AAAI Fall Symposium: Manifold Learning and Its Applications.

[48]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[49]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[50]  Yixin Chen,et al.  Automatic Feature Decomposition for Single View Co-training , 2011, ICML.

[51]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[52]  Antoni B. Chan,et al.  A Robust Likelihood Function for 3D Human Pose Tracking , 2014, IEEE Transactions on Image Processing.