The ReachOut clinical psychology shared task challenge addresses the problem of providing an automatic triage for posts to a support forum for people with a history of mental health issues. Posts are classified into green, amber, red and crisis. The non-green categories correspond to increasing levels of urgency for some form of intervention. The Thomson Reuters submissions arose from an idea about self-training and ensemble learning. The available labeled training set is small (947 examples) and the class distribution unbalanced. It was therefore hoped to develop a method that would make use of the larger dataset of unlabeled posts provided by the organisers. This did not work, but the performance of a radial basis function SVM intended as a baseline was relatively good. Therefore, the report focuses on the latter, aiming to understand the reasons for its performance.
[1]
Gérard Dreyfus,et al.
Single-layer learning revisited: a stepwise procedure for building and training a neural network
,
1989,
NATO Neurocomputing.
[2]
Jan Gorodkin,et al.
Comparing two K-category assignments by a K-category correlation coefficient
,
2004,
Comput. Biol. Chem..
[3]
Wes McKinney,et al.
Data Structures for Statistical Computing in Python
,
2010,
SciPy.
[4]
Gaël Varoquaux,et al.
Scikit-learn: Machine Learning in Python
,
2011,
J. Mach. Learn. Res..
[5]
Chih-Jen Lin,et al.
LIBSVM: A library for support vector machines
,
2011,
TIST.
[6]
David Vaughn,et al.
On The Direct Maximization of Quadratic Weighted Kappa
,
2015,
ArXiv.