Markerless 3D human pose tracking through multiple cameras and AI: Enabling high accuracy, robustness, and real-time performance

Tracking 3D human motion in real-time is crucial for numerous applications across many fields. Traditional approaches involve attaching artificial fiducial objects or sensors to the body, limiting their usability and comfort-of-use and consequently narrowing their application fields. Recent advances in Artificial Intelligence (AI) have allowed for markerless solutions. However, most of these methods operate in 2D, while those providing 3D solutions compromise accuracy and real-time performance. To address this challenge and unlock the potential of visual pose estimation methods in real-world scenarios, we propose a markerless framework that combines multi-camera views and 2D AI-based pose estimation methods to track 3D human motion. Our approach integrates a Weighted Least Square (WLS) algorithm that computes 3D human motion from multiple 2D pose estimations provided by an AI-driven method. The method is integrated within the Open-VICO framework allowing simulation and real-world execution. Several experiments have been conducted, which have shown high accuracy and real-time performance, demonstrating the high level of readiness for real-world applications and the potential to revolutionize human motion capture.

[1]  Cewu Lu,et al.  AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Chien-Chi Chang,et al.  Simple method integrating OpenPose and RGB-D camera for identifying 3D body landmark locations in various postures , 2022, International Journal of Industrial Ergonomics.

[3]  Han Zou,et al.  MetaFi: Device-Free Pose Estimation via Commodity WiFi for Metaverse Avatar Simulation , 2022, 2022 IEEE 8th World Forum on Internet of Things (WF-IoT).

[4]  A. Ajoudani,et al.  Open-VICO: An Open-Source Gazebo Toolkit for Vision-based Skeleton Tracking in Human-Robot Collaboration , 2022, 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).

[5]  Izaak Van Crombrugge,et al.  Accuracy Assessment of Joint Angles Estimated from 2D and 3D Camera Measurements , 2022, Sensors.

[6]  Arash Ajoudani,et al.  Unified Approach for Hybrid Motion Control of MOCA Based on Weighted Whole-Body Cartesian Impedance Formulation , 2021, IEEE Robotics and Automation Letters.

[7]  A. Ajoudani,et al.  A Visuo-Haptic Guidance Interface for Mobile Collaborative Robotic Assistant (MOCA) , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Philippe Montesinos,et al.  A review of 3D human pose estimation algorithms for markerless motion capture , 2020, Comput. Vis. Image Underst..

[9]  Brendan O'Flynn,et al.  Motion Capture Technology in Industrial Applications: A Systematic Review , 2020, Sensors.

[10]  Michael H. Schwartz,et al.  Deep neural networks enable quantitative movement analysis using single-camera videos , 2020, Nature Communications.

[11]  Yalew Zelalem Jembre,et al.  The Progress of Human Pose Estimation: A Survey and Taxonomy of Models Applied in 2D Human Pose Estimation , 2020, IEEE Access.

[12]  Chao Zhai,et al.  Multi-camera stereo vision based on weights , 2020, 2020 IEEE International Instrumentation and Measurement Technology Conference (I2MTC).

[13]  Costa Andrea,et al.  Wearable Biofeedback Suit to Promote and Monitor Aquatic Exercises: A Feasibility Study , 2020, IEEE Transactions on Instrumentation and Measurement.

[14]  Marco Tarabini,et al.  3D Tracking of Human Motion Using Visual Skeletonization and Stereoscopic Vision , 2020, Frontiers in Bioengineering and Biotechnology.

[15]  Yingli Tian,et al.  Monocular human pose estimation: A survey of deep learning-based methods , 2020, Comput. Vis. Image Underst..

[16]  Wilfried Philips,et al.  Multiview 3D Markerless Human Pose Estimation from OpenPose Skeletons , 2020, ACIVS.

[17]  David Picard,et al.  Consensus-Based Optimization for 3D Human Pose Estimation in Camera Coordinates , 2019, International Journal of Computer Vision.

[18]  Yoichi Iino,et al.  Evaluation of 3D Markerless Motion Capture Accuracy Using OpenPose With Multiple Video Cameras , 2019, bioRxiv.

[19]  Francisco Angel Moreno,et al.  Human 3D Pose Estimation with a Tilting Camera for Social Mobile Robot Interaction , 2019, Sensors.

[20]  Fan Zhang,et al.  MediaPipe: A Framework for Building Perception Pipelines , 2019, ArXiv.

[21]  Rafael Muñoz-Salinas,et al.  Simultaneous Multi-View Camera Pose Estimation and Object Tracking With Squared Planar Markers , 2019, IEEE Access.

[22]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Il Hong Suh,et al.  Tracking human-like natural motion by combining two deep recurrent neural networks with Kalman filter , 2018, Intelligent Service Robotics.

[24]  Steffi L. Colyer,et al.  A Review of the Evolution of Vision-Based Motion Analysis and the Integration of Advanced Computer Vision Methods Towards Developing a Markerless System , 2018, Sports Medicine - Open.

[25]  M. M. Reijne,et al.  Accuracy of human motion capture systems for sport applications; state-of-the-art review , 2018, European journal of sport science.

[26]  Marc Pollefeys,et al.  Joint Camera Pose Estimation and 3D Human Pose Estimation in a Multi-camera Setup , 2014, ACCV.

[27]  Jean-Yves Guillemaut,et al.  Athlete Pose Estimation from Monocular TV Sports Footage , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[28]  Taku Komura,et al.  A Virtual Reality Dance Training System Using Motion Capture Technology , 2011, IEEE Transactions on Learning Technologies.

[29]  Yoshihiko Nakamura,et al.  Motion capture based human motion recognition and imitation by direct marker control , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[30]  Ales Ude,et al.  Stereo-based Markerless Human Motion Capture for Humanoid Robot Systems , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[31]  Thomas P Andriacchi,et al.  The evolution of methods for the capture of human movement leading to markerless motion capture for biomechanical applications , 2006, Journal of NeuroEngineering and Rehabilitation.

[32]  Ankur Agarwal,et al.  Monocular Human Motion Capture with a Mixture of Regressors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[33]  Bryan Buchholz,et al.  ISB recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion--Part II: shoulder, elbow, wrist and hand. , 2005, Journal of biomechanics.

[34]  Hartmut Witte,et al.  ISB recommendation on definitions of joint coordinate system of various joints for the reporting of human joint motion--part I: ankle, hip, and spine. International Society of Biomechanics. , 2002, Journal of biomechanics.

[35]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  P R Cavanagh,et al.  ISB recommendations for standardization in the reporting of kinematic data. , 1995, Journal of biomechanics.

[37]  B. Julesz Binocular depth perception of computer-generated patterns , 1960 .

[38]  Abdulmotaleb El Saddik,et al.  Digital Twins: The Convergence of Multimedia Technologies , 2018, IEEE MultiMedia.

[39]  Pushmeet Kohli,et al.  Key Developments in Human Pose Estimation for Kinect , 2013, Consumer Depth Cameras for Computer Vision.

[40]  Junjun Pan,et al.  Sketch-Based Skeleton-Driven 2D Animation and Motion Capture , 2009, Trans. Edutainment.

[41]  Luca Ballan,et al.  Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes , 2008 .

[42]  Huosheng Hu,et al.  Human motion tracking for rehabilitation - A survey , 2008, Biomed. Signal Process. Control..