Current and Future Psychological Health Prediction using Language and Socio-Demographics of Children for the CLPysch 2018 Shared Task

This article is a system description and report on the submission of a team from the University of Pennsylvania in the ‘CLPsych 2018’ shared task. The goal of the shared task was to use childhood language as a marker for both current and future psychological health over individual lifetimes. Our system employs multiple textual features derived from the essays written and individuals’ socio-demographic variables at the age of 11. We considered several word clustering approaches, and explore the use of linear regression based on different feature sets. Our approach showed best results for predicting distress at the age of 42 and for predicting current anxiety on Disattenuated Pearson Correlation, and ranked fourth in the future health prediction task. In addition to the subtasks presented, we attempted to provide insight into mental health aspects at different ages. Our findings indicate that misspellings, words with illegible letters and increased use of personal pronouns are correlated with poor mental health at age 11, while descriptions about future physical activity, family and friends are correlated with good mental health.

[1]  M. Folstein,et al.  Population-based norms for the Mini-Mental State Examination by age and educational level. , 1993, JAMA.

[2]  C. Power,et al.  Cohort profile: 1958 British birth cohort (National Child Development Study). , 2006, International journal of epidemiology.

[3]  A. Masten,et al.  The development of competence in favorable and unfavorable environments. Lessons from research on successful children. , 1998, The American psychologist.

[4]  Sharath Chandra Guntuku,et al.  Language of ADHD in Adults on Social Media , 2019, Journal of attention disorders.

[5]  Megha Agrawal,et al.  Characterizing Geographic Variation in Well-Being Using Tweets , 2013, ICWSM.

[6]  Maarten Sap,et al.  Towards Assessing Changes in Degree of Depression through Facebook , 2014, CLPsych@ACL.

[7]  K. Ginsburg The Importance of Play in Promoting Healthy Child Development and Maintaining Strong Parent-Child Bonds , 2007, Pediatrics.

[8]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[9]  A. Goodman,et al.  Psychological distress in mid-life: evidence from the 1958 and 1970 British birth cohorts , 2016, Psychological Medicine.

[10]  Lyle H. Ungar,et al.  Controlling Human Perception of Basic User Traits , 2017, EMNLP.

[11]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[12]  Weisi Lin,et al.  Studying Personality through the Content of Posted and Liked Images on Twitter , 2017, WebSci.

[13]  Rich Ling,et al.  The Sociolinguistics of SMS: An Analysis of SMS Use by a Random Sample of Norwegians , 2005 .

[14]  Sharath Chandra Guntuku,et al.  Facebook versus Twitter: Cross-Platform Differences in Self-Disclosure and Trait Prediction , 2018 .

[15]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[18]  Shimei Pan,et al.  Multi-View Unsupervised User Feature Embedding for Social Media-based Substance Use Prediction , 2017, EMNLP.

[19]  Gregory J. Park,et al.  Automatic personality assessment through social media language. , 2015, Journal of personality and social psychology.

[20]  M. Sjöström,et al.  Physical fitness in childhood and adolescence: a powerful marker of health , 2008, International Journal of Obesity.

[21]  Ryan L. Boyd,et al.  Language-based personality: a new approach to personality in a digital world , 2017, Current Opinion in Behavioral Sciences.

[22]  D. Biber,et al.  Drift and the Evolution of English Style: A History of Three Genres , 1989 .

[23]  Benjamin,et al.  Facebook VS.亚洲社会网络 , 2008 .

[24]  Lyle H. Ungar,et al.  Diachronic degradation of language models: Insights from social media , 2018, ACL.

[25]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[26]  Weisi Lin,et al.  Modelling the influence of personality and culture on affect and enjoyment in multimedia , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[27]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[28]  Niranjan Balasubramanian,et al.  Human Centered NLP with User-Factor Adaptation , 2017, EMNLP.

[29]  Trevor Cohn,et al.  Predicting and Characterising User Impact on Twitter , 2014, EACL.

[30]  James P. Smith The Impact of Socioeconomic Status on Health over the Life-Course , 2007, The Journal of Human Resources.

[31]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[32]  Maarten Sap,et al.  DLATK: Differential Language Analysis ToolKit , 2017, EMNLP.

[33]  Weisi Lin,et al.  Understanding Deep Representations Learned in Modeling Users Likes , 2016, IEEE Transactions on Image Processing.

[34]  Sharath Chandra Guntuku,et al.  Detecting depression and mental illness on social media: an integrative review , 2017, Current Opinion in Behavioral Sciences.

[35]  Weisi Lin,et al.  The CP-QAE-I: A video dataset for exploring the effect of personality and culture on perceived quality and affect in multimedia , 2015, 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX).

[36]  Harvey Goldstein,et al.  From birth to seven: the second report of the national child development study. (1958 Cohort)) , 1972 .