User Profiling through Deep Multimodal Fusion

User profiling in social media has gained a lot of attention due to its varied set of applications in advertising, marketing, recruiting, and law enforcement. Among the various techniques for user modeling, there is fairly limited work on how to merge multiple sources or modalities of user data - such as text, images, and relations - to arrive at more accurate user profiles. In this paper, we propose a deep learning approach that extracts and fuses information across different modalities. Our hybrid user profiling framework utilizes a shared representation between modalities to integrate three sources of data at the feature level, and combines the decision of separate networks that operate on each combination of data sources at the decision level. Our experimental results on more than 5K Facebook users demonstrate that our approach outperforms competing approaches for inferring age, gender and personality traits of social media users. We get highly accurate results with AUC values of more than 0.9 for the task of age prediction and 0.95 for the task of gender prediction.

[1]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[2]  Jian Sun,et al.  Face recognition with learning-based descriptor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Mei-Chen Yeh,et al.  Multimodal fusion using learned text concepts for image categorization , 2006, MM '06.

[4]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[5]  Robert Riener,et al.  A survey of sensor fusion methods in wearable robotics , 2015, Robotics Auton. Syst..

[6]  Benno Stein,et al.  Overview of the 3rd Author Profiling Task at PAN 2015 , 2015, CLEF.

[7]  Marie-Francine Moens,et al.  Scalable adaptive label propagation in Grappa , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[8]  J. Pennebaker,et al.  Linguistic styles: language use as an individual difference. , 1999, Journal of personality and social psychology.

[9]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[10]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[11]  Nuria Oliver,et al.  Towards a psychographic user model from mobile phone usage , 2011, CHI Extended Abstracts.

[12]  Li Chen,et al.  Personality and Recommender Systems , 2015, Recommender Systems Handbook.

[13]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[14]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[15]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[16]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[17]  Daniel Gatica-Perez,et al.  The YouTube Lens: Crowdsourced Personality Impressions and Audiovisual Analysis of Vlogs , 2013, IEEE Transactions on Multimedia.

[18]  Luc Van Gool,et al.  DEX: Deep EXpectation of Apparent Age from a Single Image , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[19]  Nicholas Jing Yuan,et al.  Beyond the Words: Predicting User Personality from Heterogeneous Information , 2017, WSDM.

[20]  Changsheng Xu,et al.  Using Webcast Text for Semantic Event Detection in Broadcast Sports Video , 2008, IEEE Transactions on Multimedia.

[21]  John A. Johnson,et al.  The international personality item pool and the future of public-domain personality measures ☆ , 2006 .

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Xi Wang,et al.  Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.

[24]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  Marie-Francine Moens,et al.  Computational personality recognition in social media , 2016, User Modeling and User-Adapted Interaction.

[27]  Andrew Beng Jin Teoh,et al.  Stacking-based deep neural network: Deep analytic network on convolutional spectral histogram features , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[28]  Tat-Seng Chua,et al.  Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks , 2017, IJCAI.

[29]  Simon King,et al.  Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Lyle H. Ungar,et al.  Analyzing Personality through Social Media Profile Picture Choice , 2016, ICWSM.

[31]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[32]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[33]  Jon Oberlander,et al.  The Identity of Bloggers: Openness and Gender in Personal Weblogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[34]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.