What Face and Body Shapes Can Tell About Height

Recovering a person's height from a single image is important for virtual garment fitting, autonomous driving and surveillance, however, it is also very challenging due to the absence of absolute scale information. We tackle the rarely addressed case, where camera parameters and scene geometry is unknown. To nevertheless resolve the inherent scale ambiguity, we infer height from statistics that are intrinsic to human anatomy and can be estimated from images directly, such as articulated pose, bone length proportions, and facial features. Our contribution is twofold. First, we experiment with different machine learning models to capture the relation between image content and human height. Second, we show that performance is predominantly limited by dataset size and create a new dataset that is three magnitudes larger, by mining explicit height labels and propagating them to additional images through face recognition and assignment consistency. Our evaluation shows that monocular height estimation is possible with a MAE of 5.56cm.

[1]  Arun Ross,et al.  Predictability and correlation in human metrology , 2010, 2010 IEEE International Workshop on Information Forensics and Security.

[2]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Keith W. Ross,et al.  Estimating heights from photo collections: a data-driven approach , 2014, COSN '14.

[4]  Nicholas O. Rule,et al.  JUdGMentS oF heiGht FRoM FaceS aRe inFoRMed BY doMinance and FaciaL MatURitY , 2013 .

[5]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[6]  Pascal Fua,et al.  Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Xiaolong Wang,et al.  Deeply-Learned Feature for Age Estimation , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Yuan Dong,et al.  Automatic age estimation based on deep learning algorithm , 2016, Neurocomputing.

[10]  Izzet Duyar,et al.  Body height estimation based on tibia length in different stature groups. , 2003, American journal of physical anthropology.

[11]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[12]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[13]  Lee Meadows Jantz,et al.  Evaluation of Stature Estimation from the Database for Forensic Anthropology *† , 2010, Journal of forensic sciences.

[14]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  A. Higashiyama,et al.  Estimation of height for persons in pictures , 1998, Perception & psychophysics.

[16]  Bin Zhang,et al.  The Measurement of Human Height Based on Coordinate Transformation , 2016, ICIC.

[17]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[18]  Claire C. Gordon,et al.  2012 Anthropometric Survey of U.S. Army Personnel: Methods and Summary Statistics , 2014 .

[19]  M. Işcan,et al.  Estimation of stature from body parts. , 2003, Forensic science international.

[20]  Ye-Peng Guan Unsupervised human height estimation from a single image , 2009 .

[21]  Jenny Ljungberg,et al.  Estimation of human height from surveillance camera footage - a reliability study , 2008 .

[22]  Yaser Yacoob,et al.  Statistical body height estimation from a single image , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[23]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[24]  T Y Shiang A statistical approach to data analysis and 3-D geometric description of the human head and face. , 1999, Proceedings of the National Science Council, Republic of China. Part B, Life sciences.

[25]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Martin Wegrzyn,et al.  Mapping the emotional face. How individual face parts contribute to successful emotion recognition , 2017, PloS one.

[27]  Andrew Zisserman,et al.  Personalizing Human Video Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Luc Van Gool,et al.  Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2016, International Journal of Computer Vision.

[29]  Refik Can Malli,et al.  Apparent Age Estimation Using Ensemble of Deep Learning Models , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[31]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zhan Yu,et al.  Lytro camera technology: theory, algorithms, performance analysis , 2013, Electronic Imaging.

[33]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiaowei Zhou,et al.  Harvesting Multiple Views for Marker-Less 3D Human Pose Annotations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Abir Hudait,et al.  Automatic emotion detection model from facial expression , 2016, 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT).

[38]  John Albanese,et al.  An alternative approach for estimating stature from long bones that is not population- or group-specific. , 2016, Forensic science international.

[39]  George Mather,et al.  Image blur as a pictorial depth cue , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[40]  Mingjie Ma,et al.  A simplified nonlinear regression method for human height estimation in video surveillance , 2015, EURASIP J. Image Video Process..

[41]  Cristian Sminchisescu,et al.  Deep Multitask Architecture for Integrated 2D and 3D Human Sensing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[43]  D. Perrett,et al.  Facial Cues to Perceived Height Influence Leadership Choices in Simulated War and Peace Contexts , 2013, Evolutionary psychology : an international journal of evolutionary approaches to psychology and behavior.

[44]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  George Mather Head – Body Ratio as a Visual Cue for Stature in People and Sculptural Art , 2010, Perception.