Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports.
[1]
Ameet Talwalkar,et al.
MLlib: Machine Learning in Apache Spark
,
2015,
J. Mach. Learn. Res..
[2]
J. Pennebaker,et al.
The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods
,
2010
.
[3]
Cheng Soon Ong,et al.
Multivariate spearman's ρ for aggregating ranks using copulas
,
2016
.
[4]
Scott Shenker,et al.
Spark: Cluster Computing with Working Sets
,
2010,
HotCloud.
[5]
M. Bradley,et al.
Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings
,
1999
.
[6]
Thin Nguyen,et al.
Mood Patterns and Affective Lexicon Access in Weblogs
,
2010,
ACL.
[7]
Trung Le,et al.
One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems
,
2016,
2016 IEEE 16th International Conference on Data Mining (ICDM).
[8]
Munmun De Choudhury,et al.
Mental Health Discourse on reddit: Self-Disclosure, Social Support, and Anonymity
,
2014,
ICWSM.