Sense beauty via face, dressing, and/or voice

Discovering the secret of beauty has been the pursuit of artists and philosophers for centuries. Nowadays, the computational model for beauty estimation has been actively explored in computer science community, yet with the focus mainly on facial features. In this work, we perform a comprehensive study of female attractiveness conveyed by single/multiple modalities of cues, i.e., face, dressing and/or voice, and aim to uncover how different modalities individually and collectively affect the human sense of beauty. To this end, we collect the first Multi-Modality Beauty (M2B) dataset in the world for female attractiveness study, which is thoroughly annotated with attractiveness levels converted from manual k-wise ratings and semantic attributes of different modalities. A novel Dual-supervised Feature-Attribute-Task (DFAT) network is proposed to jointly learn the beauty estimation models of single/multiple modalities as well as the attribute estimation models. The DFAT network differentiates itself by its supervision in both attribute and task layers. Several interesting beauty-sense observations over single/multiple modalities are reported, and the extensive experimental evaluations on the collected M2B dataset well demonstrate the effectiveness of the proposed DFAT network for female attractiveness estimation.

[1]  Bingbing Ni,et al.  Web image mining towards universal age estimator , 2009, ACM Multimedia.

[2]  C. D. Green,et al.  All That Glitters: A Review of Psychological Research on the Aesthetics of the Golden Section , 1995, Perception.

[3]  Dong Guo,et al.  Digital face makeup by example , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[6]  David R. Feinberg,et al.  Sex-Dimorphic Face Shape Preference in Heterosexual and Homosexual Men and Women , 2010, Archives of sexual behavior.

[7]  M. Cunningham,et al.  Article Commentary: Averaged Faces Are Attractive, but Very Attractive Faces Are Not Average , 1991 .

[8]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[9]  E. Berscheid,et al.  What is beautiful is good. , 1972, Journal of personality and social psychology.

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[12]  M. Zuckerman,et al.  The attractive voice: What makes it so? , 1993 .

[13]  Yihong Gong,et al.  Predicting Facial Beauty without Landmarks , 2010, ECCV.

[14]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[15]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Bingbing Ni,et al.  Learning to photograph , 2010, ACM Multimedia.

[18]  Shree K. Nayar,et al.  FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.

[19]  Eytan Ruppin,et al.  Facial Attractiveness: Beauty and the Machine , 2006, Neural Computation.

[20]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[21]  Markus Kiefer,et al.  A holistic account of the own-race effect in face recognition: evidence from a cross-cultural study , 2004, Cognition.

[22]  Susan M. Hughes,et al.  Ratings of voice attractiveness predict sexual behavior and body configuration , 2004 .

[23]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[24]  Daniel Cohen-Or,et al.  A Humanlike Predictor of Facial Attractiveness , 2006, NIPS.

[25]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[26]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[27]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Pamela M. Pallett,et al.  New “golden” ratios for facial beauty , 2010, Vision Research.

[29]  U. Hess,et al.  An Ingroup Advantage for Confidence in Emotion Recognition Judgments: The Moderating Effect of Familiarity With the Expressions of Outgroup Members , 2006, Personality & social psychology bulletin.

[30]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[31]  Confucius The Doctrine Of The Mean , 2010 .

[32]  Meng Wang,et al.  Predicting occupation via human clothing and contexts , 2011, 2011 International Conference on Computer Vision.

[33]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Parham Aarabi,et al.  The automatic measurement of facial beauty , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[35]  Sharron J. Lennon,et al.  Effects of Clothing Attractiveness on Perceptions , 1990 .