Towards User Personality Profiling from Multiple Social Networks

The exponential growth of online social networks has inspired us to tackle the problem of individual user attributes inference from the Big Data perspective. It is well known that various social media networks exhibit different aspects of user interactions, and thus represent users from diverse points of view. In this preliminary study, we make the first step towards solving the significant problem of personality profiling from multiple social networks. Specifically, we tackle the task of relationship prediction, which is closely related to our desired problem. Experimental results show that the incorporation of multi-source data helps to achieve better prediction performance as compared to single-source baselines. User profiling plays an increasingly important role in many application domains (Farseev, Samborskii, and Chua 2016). One of the critical components of user profiling is personality profiling (Pennebaker, Mehl, and Niederhoffer 2003), which seeks to identify one’s mental and emotional characteristics. Knowing these personal attributes can help to understand reasons behind one’s behaviour (Pennebaker, Mehl, and Niederhoffer 2003), select suitable individuals for particular tasks (Song et al. 2015), and motivate people to undertake new challenges in their life. Up to now, there have been several research attempts towards personality profiling. For example, some research groups have investigated this problem from the social science point of view (Pennebaker, Mehl, and Niederhoffer 2003). However, most of these works are descriptive in nature and rely on manual data collection procedures, which explains the absence of large-scale research in the field. With the recent growth of the Web, personality profiling can be approached by taking advantage of the abundance of data from online social networks. For example, such data has been utilized by several studies and evaluations devoted to automatic personality profiling, such as TwiSty (Verhoeven, Daelemans, and Plank 2016) or PAN (Rangel et al. 2015). Even though these studies made a significant progress towards automatic personality profiling, most of them were carried out on data from a single source (i.e. Twitter) or of a single modality (i.e. Text). Such personality profiling may lead to a sub-optimal performance (Farseev and Chua 2017). Taking into account that Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. most social networks users use more than one social network in their daily life (Farseev et al. 2015a), it is reasonable to utilize multiple data sources and modalities to solve personality profiling task. There are several personality categorization schemes adopted by the research community. One of the most widely embraced typologies is called Myers-Briggs Type Indicator (MBTI), that was proposed by Mayer and Briggs in 1985 and based on Carl Jung’s theory. The typology is designed to exhibit psychological preferences on how people perceive the world around them and distinguishes 16 personality types. Meanwhile, it was also discovered that social media services exceedingly affect and reflect the way their users communicate with the world and among themselves (Kaplan and Haenlein 2010). Based on these observations, it follows that MBTI categorization schema naturally fits social media research. Further, according to the previous studies (Farseev et al. 2015b; Farseev and Chua 2017) and our findings, social media users reveal their personal attributes differently in different social media platforms. For example, they may post photos in photo-sharing services, such as Instagram, or perform check-ins in location-based social networks, such as Foursquare. All this data describes users from the 360◦ view and, thus, plays an essential role in social media-based personality profiling. However, personality profiling from multiple social networks is associated with the following challenges: • Cross-source user identification. Often, it is not possible to identify multiple social networks accounts that belong to the same person, while some users use a limited number of social networks. • Ground-truth collection. Not all online resources with MBTI information about their users are approved by psychologists, while only a limited number of social networks posts is equipped with the references to trusted MBTI profiling resources. • Temporal changes of users’ personality. Users’ personality trends vary over time under the influence of different life aspects and external factors, which requires additional consideration during the data modeling process. • Data source fusion. Effective fusion of multi-view data from different sources in one model is a challenging problem (Song et al. 2015). Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)