A Survey of Applications and Human Motion Recognition with Microsoft Kinect

Microsoft Kinect, a low-cost motion sensing device, enables users to interact with computers or game consoles naturally through gestures and spoken commands without any other peripheral equipment. As such, it has commanded intense interests in research and development on the Kinect technology. In this paper, we present, a comprehensive survey on Kinect applications, and the latest research and development on motion recognition using data captured by the Kinect sensor. On the applications front, we review the applications of the Kinect technology in a variety of areas, including healthcare, education and performing arts, robotics, sign language recognition, retail services, workplace safety training, as well as 3D reconstructions. On the technology front, we provide an overview of the main features of both versions of the Kinect sensor together with the depth sensing technologies used, and review literatures on human motion recognition techniques used in Kinect applications. We provide a classification of motion recognition techniques to highlight the different approaches used in human motion recognition. Furthermore, we compile a list of publicly available Kinect datasets. These datasets are valuable resources for researchers to investigate better methods for human motion recognition and lower-level computer vision tasks such as segmentation, object detection and human pose estimation.

[1]  Xing Yan,et al.  Research on Skeleton Animation Motion Data Based on Kinect , 2012, 2012 Fifth International Symposium on Computational Intelligence and Design.

[2]  ZhangZhengyou,et al.  Robust Part-Based Hand Gesture Recognition Using Kinect Sensor , 2013 .

[3]  H. Hashimoto,et al.  Human motion tracking of mobile robot with Kinect 3D sensor , 2012, 2012 Proceedings of SICE Annual Conference (SICE).

[4]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[5]  Vassilis Athitsos,et al.  Comparing gesture recognition accuracy using color and depth information , 2011, PETRA '11.

[6]  Zhongfu Ye,et al.  An Isolated Sign Language Recognition System Using RGB-D Sensor with Sparse Coding , 2014, 2014 IEEE 17th International Conference on Computational Science and Engineering.

[7]  M. Hunt,et al.  Validity of the Microsoft Kinect for providing lateral trunk lean feedback during gait retraining. , 2013, Gait & posture.

[8]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[9]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[10]  Linda Denehy,et al.  Validity of the Microsoft Kinect for assessment of postural control. , 2012, Gait & posture.

[11]  Ming-Sui Lee,et al.  Human action recognition using Action Trait Code , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[12]  Kai-Tai Song,et al.  Robotic Emotional Expression Generation Based on Mood Transition and Personality Model , 2013, IEEE Transactions on Cybernetics.

[13]  Michael Hayes,et al.  Altitude control of a quadrotor helicopter using depth map from Microsoft Kinect sensor , 2011, 2011 IEEE International Conference on Mechatronics.

[14]  Gang Yu,et al.  Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction , 2014, ACCV.

[15]  Svenja Kahn,et al.  Enhancing realism of mixed reality applications through real-time depth-imaging devices in X3D , 2011, Web3D '11.

[16]  S. Saini,et al.  A low-cost game framework for a home-based stroke rehabilitation system , 2012, 2012 International Conference on Computer & Information Science (ICCIS).

[17]  J. Gaber,et al.  Collision Avatar (CA): Adding collision objects for human body in augmented reality using Kinect , 2012, 2012 6th International Conference on Application of Information and Communication Technologies (AICT).

[18]  L. M. Pedro,et al.  Kinect evaluation for human body movement analysis , 2012, 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob).

[19]  Rita Francese,et al.  Wiimote and Kinect: gestural user interfaces add a natural third dimension to HCI , 2012, AVI.

[20]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[21]  Anant Agarwal,et al.  Sign language recognition using Microsoft Kinect , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[22]  Jeffrey E. Boyd,et al.  In Situ Motion Capture of Speed Skating: Escaping the Treadmill , 2012, 2012 Ninth Conference on Computer and Robot Vision.

[23]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[24]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[25]  Yoichiro Maeda,et al.  Music conductor gesture recognized interactive music generation system , 2012, The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems.

[26]  Pushmeet Kohli,et al.  A Contour Completion Model for Augmenting Surface Reconstructions , 2014, ECCV.

[27]  Yui Man Lui,et al.  A least squares regression framework on manifolds and its application to gesture recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[29]  Sebastian Feuerstack,et al.  A real-time system to recognize static gestures of Brazilian sign language (libras) alphabet using Kinect , 2012, IHC.

[30]  Pavan K. Turaga,et al.  SomaTech: an exploratory interface for altering movement habits , 2014, CHI Extended Abstracts.

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32]  Yasushi Makihara,et al.  Inverse Dynamics for Action Recognition , 2013, IEEE Transactions on Cybernetics.

[33]  Jong-wook Kang,et al.  A Study on the control Method of 3-Dimensional Space Application using KINECT System , 2011 .

[34]  Tilak Dutta,et al.  Evaluation of the Kinect™ sensor for 3-D kinematic measurement in the workplace. , 2012, Applied ergonomics.

[35]  Kazuyuki Murase,et al.  Real-Time Hand Gesture Recognition Using Complex-Valued Neural Network (CVNN) , 2011, ICONIP.

[36]  Max Mühlhäuser,et al.  Automatic Camera Control for Tracking a Presenter during a Talk , 2012, 2012 IEEE International Symposium on Multimedia.

[37]  Ali Arya,et al.  Empirical study of a vision-based depth-sensitive human-computer interaction system , 2012, APCHI '12.

[38]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Erdal Oruklu,et al.  3D image reconstruction and human body tracking using stereo vision and Kinect technology , 2012, 2012 IEEE International Conference on Electro/Information Technology.

[40]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[41]  Yao-Jen Chang,et al.  A Kinect-based system for physical rehabilitation: a pilot study for young adults with motor disabilities. , 2011, Research in developmental disabilities.

[42]  Vangelis Lympouridis,et al.  Mixed reality game prototypes for upper body exercise and rehabilitation , 2012, 2012 IEEE Virtual Reality Workshops (VRW).

[43]  Xuan Song,et al.  Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Zahid Halim,et al.  Dynamic time wrapping based gesture recognition , 2014, 2014 International Conference on Robotics and Emerging Allied Technologies in Engineering (iCREATE).

[45]  Philippe Poignet,et al.  Joint angle estimation in rehabilitation with inertial sensors and its integration with Kinect , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[46]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[47]  Wenyuan Xu,et al.  KinWrite: Handwriting-Based Authentication Using Kinect , 2013, NDSS.

[48]  Marjorie Skubic,et al.  Passive, in-home gait measurement using an inexpensive depth camera: Initial results , 2012, 2012 6th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops.

[49]  Vangelis Metsis,et al.  A viewpoint-independent statistical method for fall detection , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[50]  Erhan Akdogan,et al.  Knee rehabilitation using an intelligent robotic system , 2009, J. Intell. Manuf..

[51]  Nassir Navab,et al.  Simultaneous categorical and spatio-temporal 3D gestures using Kinect , 2012, 2012 IEEE Symposium on 3D User Interfaces (3DUI).

[52]  Ligang Liu,et al.  Scanning 3D Full Human Bodies Using Kinects , 2012, IEEE Transactions on Visualization and Computer Graphics.

[53]  Jidong Huang,et al.  Study on the use of Microsoft Kinect for robotics applications , 2012, Proceedings of the 2012 IEEE/ION Position, Location and Navigation Symposium.

[54]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[55]  Léon J. M. Rothkrantz,et al.  Kinect Sensing of Shopping Related Actions , 2011, AmI Workshops.

[56]  Max Mignotte,et al.  Fall Detection from Depth Map Video Sequences , 2011, ICOST.

[57]  Chuan-Jun Su,et al.  Personal Rehabilitation Exercise Assistant with Kinect and Dynamic Time Warping , 2013, CIKM 2013.

[58]  Hans-Werner Gellersen,et al.  MotionMA: motion modelling and analysis by demonstration , 2013, CHI.

[59]  Ramesh Raskar,et al.  3D Depth Cameras in Vision: Benefits and Limitations of the Hardware , 2014 .

[60]  Nathan Silberman,et al.  Instance Segmentation of Indoor Scenes Using a Coverage Loss , 2014, ECCV.

[61]  Benjamín R. C. Bedregal,et al.  Fuzzy Rule-Based Hand Gesture Recognition , 2006, IFIP AI.

[62]  Mario Fernando Montenegro Campos,et al.  Real-Time Gesture Recognition from Depth Data through Key Poses Learning and Decision Forests , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images.

[63]  Albert A. Rizzo,et al.  Development and evaluation of low cost game-based balance rehabilitation tool using the microsoft kinect sensor , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[64]  Songül Albayrak,et al.  A Kinect based sign language recognition system using spatio-temporal features , 2013, Other Conferences.

[65]  Meredith Ringel Morris,et al.  Kinected browser: depth camera interaction for the web , 2012, ITS '12.

[66]  Fillia Makedon,et al.  Audio-visual speech recognition using depth information from the Kinect in noisy video conditions , 2012, PETRA '12.

[67]  Yangsheng Xu,et al.  A real-time human imitation system , 2012, Proceedings of the 10th World Congress on Intelligent Control and Automation.

[68]  Jorge Lobo,et al.  Hand Gesture Recognition Using Color and Depth Images Enhanced with Hand Angular Pose Data * , 2022 .

[69]  Saeid Nahavandi,et al.  Extracting 3D Mesh Skeletons Using Antipodal Points Locations , 2013, 2013 UKSim 15th International Conference on Computer Modelling and Simulation.

[70]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[71]  Dimitrios Makris,et al.  Fall detection system using Kinect’s infrared sensor , 2014, Journal of Real-Time Image Processing.

[72]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[73]  Raúl Rojas,et al.  Sign Language Recognition Using Kinect , 2012, ICAISC.

[74]  Kenton O'Hara,et al.  Exploring the potential for touchless interaction in image-guided interventional radiology , 2011, CHI.

[75]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[76]  Wenbing Zhao,et al.  A Kinect-based rehabilitation exercise monitoring and guidance system , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[77]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[78]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[79]  Erdefi Rakun,et al.  Combining depth image and skeleton data from Kinect for recognizing words in the sign system for Indonesian language (SIBI [Sistem Isyarat Bahasa Indonesia]) , 2013, 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[80]  Petros Daras,et al.  Real-Time, Full 3-D Reconstruction of Moving Foreground Objects From Multiple Consumer Depth Cameras , 2013, IEEE Transactions on Multimedia.

[81]  Nadia Magnenat-Thalmann,et al.  Fall detection based on skeleton extraction , 2012, VRCAI '12.

[82]  Jiangping Wang,et al.  A Kinect-based golf swing classification system using HMM and Neuro-Fuzzy , 2012, 2012 International Conference on Computer Science and Information Processing (CSIP).

[83]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[84]  David Kim,et al.  HoloDesk: direct 3d interactions with a situated see-through display , 2012, CHI.

[85]  Francisco Luis Gutiérrez Vela,et al.  Natural interaction techniques using Kinect , 2012 .

[86]  C. C. Martin,et al.  A real-time ergonomic monitoring system using the Microsoft Kinect , 2012, 2012 IEEE Systems and Information Engineering Design Symposium.

[87]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[88]  Sarajane Marques Peres,et al.  Gesture unit segmentation using support vector machines: segmenting gestures from rest positions , 2013, SAC '13.

[89]  T. B. Moeslund,et al.  Evaluation of human body tracking system for gesture-based programming of industrial robots , 2012, 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA).

[90]  F. Zuher,et al.  Recognition of Human Motions for Imitation and Control of a Humanoid Robot , 2012, 2012 Brazilian Robotics Symposium and Latin American Robotics Symposium.

[91]  Fan Chen,et al.  Extraction of Discriminative Patterns from Skeleton Sequences for Human Action Recognition , 2012, 2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future.

[92]  Ling Shao,et al.  One shot learning gesture recognition from RGBD images , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[93]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[94]  Petros Daras,et al.  A dataset of Kinect-based 3D scans , 2013, IVMSP 2013.

[95]  B. Watanapa,et al.  Human gesture recognition using Kinect camera , 2012, 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE).

[96]  Brendan Jennings,et al.  Controlling the transfer of Kinect data to a cloud-hosted games platform , 2013, NOSSDAV '13.

[97]  Venu Madhav Govindu,et al.  A pipeline for building 3D models using depth cameras , 2012, ICVGIP '12.

[98]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[99]  Philippe Giguère,et al.  Sign Language Fingerspelling Classification from Depth and Color Images Using a Deep Belief Network , 2014, 2014 Canadian Conference on Computer and Robot Vision.

[100]  Dan Xu,et al.  Real-time dynamic gesture recognition system based on depth perception for robot navigation , 2012, 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[101]  Meredith Ringel Morris,et al.  Code space: touch + air gesture hybrid interactions for supporting developer meetings , 2011, ITS '11.

[102]  Sven Nomm,et al.  Monitoring of the Human Motor Functions Rehabilitation by Neural Networks Based System with Kinect Sensor , 2013, IFAC HMS.

[103]  Bingbing Ni,et al.  RGBD-camera based get-up event detection for hospital fall prevention , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[104]  Nadir Weibel,et al.  MotionDraw: a tool for enhancing art and performance using kinect , 2013, CHI Extended Abstracts.

[105]  C. Waithayanon,et al.  A motion classifier for Microsoft Kinect , 2012, 2011 6th International Conference on Computer Sciences and Convergence Information Technology (ICCIT).

[106]  Verónica Orvalho,et al.  Shape your body: control a virtual silhouette using body motion , 2012, CHI EA '12.

[107]  周炯,et al.  Kind of static method and system for object reconstruction , 2014 .

[108]  Alex Mihailidis,et al.  3D Human Motion Analysis to Detect Abnormal Events on Stairs , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[109]  Wenbing Zhao,et al.  Rule based realtime motion assessment for rehabilitation exercises , 2014, 2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE).

[110]  Thad Starner,et al.  American sign language recognition with the kinect , 2011, ICMI '11.

[111]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[112]  Kai Oliver Arras,et al.  People tracking in RGB-D data with on-line boosted target models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[113]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[114]  Joo-Ho Lee,et al.  Full-body imitation of human motions with kinect and heterogeneous kinematic structure of humanoid robot , 2012, 2012 IEEE/SICE International Symposium on System Integration (SII).

[115]  Marek R. Ogiela,et al.  Rule-based approach to recognizing human body poses and gestures in real time , 2013, Multimedia Systems.

[116]  Mario Ciampi,et al.  Controller-free exploration of medical image data: Experiencing the Kinect , 2011, 2011 24th International Symposium on Computer-Based Medical Systems (CBMS).

[117]  Anne Marie Piper,et al.  A Wizard-of-Oz elicitation study examining child-defined gestures with a whole-body interface , 2013, IDC.

[118]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[119]  Junsong Yuan,et al.  Robust Part-Based Hand Gesture Recognition Using Kinect Sensor , 2013, IEEE Transactions on Multimedia.

[120]  Albert A. Rizzo,et al.  FAAST: The Flexible Action and Articulated Skeleton Toolkit , 2011, 2011 IEEE Virtual Reality Conference.

[121]  Mariusz Oszust,et al.  Recognition of signed expressions observed by Kinect Sensor , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[122]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[123]  Vassilios Morellas,et al.  Sparse representation of point trajectories for action classification , 2012, 2012 IEEE International Conference on Robotics and Automation.

[124]  Mukund Raj,et al.  Kinect based 3D object manipulation on a desktop display , 2012, SAP '12.

[125]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[126]  Frederico G. Guimarães,et al.  Feature extraction in Brazilian Sign Language Recognition based on phonological structure and using RGB-D sensors , 2014, Expert Syst. Appl..

[127]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[128]  R. Harikrishnan,et al.  A vision based dynamic gesture recognition of Indian Sign Language on Kinect based depth images , 2013, 2013 International Conference on Emerging Trends in Communication, Control, Signal Processing and Computing Applications (C2SPCA).

[129]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[130]  Norman H. Villaroman,et al.  Teaching natural user interaction using OpenNI and the Microsoft Kinect sensor , 2011, SIGITE '11.

[131]  Junsong Yuan,et al.  Depth camera based hand gesture recognition and its applications in Human-Computer-Interaction , 2011, 2011 8th International Conference on Information, Communications & Signal Processing.

[132]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[133]  Albert A. Rizzo,et al.  Interactive game-based rehabilitation using the Microsoft Kinect , 2012, 2012 IEEE Virtual Reality Workshops (VRW).

[134]  Mohamed Abdur Rahman,et al.  Multimedia interactive therapy environment for children having physical disabilities , 2013, ICMR.

[135]  Jason Jianjun Gu,et al.  Combining features for Chinese sign language recognition with Kinect , 2014, 11th IEEE International Conference on Control & Automation (ICCA).

[136]  Satish Chandra,et al.  Gesture recognition using kinect for sign language translation , 2013, 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013).

[137]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[138]  Li Bin,et al.  Interaction System of Treadmill Games based on depth maps and CAM-Shift , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[139]  Qi Sun,et al.  Design and implementation of human-robot interactive demonstration system based on Kinect , 2012, 2012 24th Chinese Control and Decision Conference (CCDC).

[140]  Lu Wang,et al.  Magic Mirror: A virtual handbag shopping system , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[141]  W. Marsden I and J , 2012 .

[142]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[143]  Marjorie Skubic,et al.  Passive in-home measurement of stride-to-stride gait variability comparing vision and Kinect sensing , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[144]  Yugo Takeuchi,et al.  Perception analysis of motion contributing to individuality using Kinect sensor , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[145]  Fu-Hao Yeh,et al.  Kinect-based Taiwanese sign-language recognition system , 2014, Multimedia Tools and Applications.

[146]  Albert A. Rizzo,et al.  Towards pervasive physical rehabilitation using Microsoft Kinect , 2012, 2012 6th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops.

[147]  Yi Su,et al.  SmartGlove for upper extremities rehabilitative gaming assessment , 2012, PETRA '12.