Human motion segmentation based on low-rank representation

We propose a robust and promising algorithm, named Low Rank Representation (LRR), for addressing human motion segmentation. LRR seeks the lowest rank representation among all the data jointly that represent all data vectors as the linear combination of the base in a dictionary. Given the human motion video, each frame can be regarded as an image, which is a representation of a collection of data vectors jointly. In many cases, the background variations are assumed to be low-rank, while the foreground human motion is sparse. The human motion part can be obtained by removing the low-rank part from the original image. Then the problem is converted to seek the Low Rank Representation of the image. This process is formulated as a convex optimization problem that minimizes a constrained combination of nuclear norm and ℓ2, 1-norm, which can be solved efficiently with Augmented Lagrange Multiplier (ALM) method. Compared to several methods for Human motion segmentation, the proposed method produces more reliable results, yet being more robust to noise and outliers. We do some experiments on the HumanEva human motion dataset. The results show that human motion segmentation by the proposed method is robust and promising.

[1]  Michael J. Black,et al.  A Quantitative Evaluation of Video-based 3D Person Tracking , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  Sven Nordholm,et al.  A low complexity statistical voice activity detector with performance comparisons to ITU-T/ETSI voice activity detectors , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[3]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[4]  Yun Fu,et al.  Human Motion Tracking by Temporal-Spatial Local Gaussian Process Experts , 2011, IEEE Transactions on Image Processing.

[5]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[6]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[7]  Hsin-Min Wang,et al.  Background music identification through content filtering and min-hash matching , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Larry S. Davis,et al.  Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[11]  John Wright,et al.  Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Chia-Feng Juang,et al.  Computer Vision-Based Human Body Segmentation and Posture Estimation , 2009, SMC 2009.

[13]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[14]  Hossein Mobahi,et al.  Holistic 3D reconstruction of urban structures from low-rank textures , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[15]  Shuicheng Yan,et al.  Multi-task low-rank affinity pursuit for image segmentation , 2011, 2011 International Conference on Computer Vision.

[16]  Jun-Wei Hsieh,et al.  Segmentation of Human Body Parts Using Deformable Triangulation , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[17]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[18]  Liu Qingsheng Research on a Speech Endpoint Detection Method , 2003 .

[19]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[20]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[22]  Pedro Cano,et al.  A review of algorithms for audio fingerprinting , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[23]  Nicholas I. M. Gould,et al.  SIAM Journal on Optimization , 2012 .

[24]  Yasuyuki Matsushita,et al.  Camera calibration with lens distortion from low-rank textures , 2011, CVPR 2011.

[25]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[26]  Yoshiharu Suga,et al.  Real-time background music monitoring based on content-based retrieval , 2004, MULTIMEDIA '04.

[27]  John Wright,et al.  RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Yongtian Wang,et al.  Robust Photometric Stereo via Low-Rank Matrix Completion and Recovery , 2010, ACCV.

[29]  Jack Yurkiewicz,et al.  Constrained optimization and Lagrange multiplier methods, by D. P. Bertsekas, Academic Press, New York, 1982, 395 pp. Price: $65.00 , 1985, Networks.

[30]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[31]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[32]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[33]  Yi Ma,et al.  TILT: Transform Invariant Low-Rank Textures , 2010, ACCV 2010.