Temporal Variation of Terms as Concept Space for Early Risk Prediction

Abstra t. Early risk predi tion involves three di erent aspe ts to be onsidered when an automati lassi er is implemented for this task: a) support for lassi ation with partial information read up to di erent time steps, b) support for dealing with unbalan ed data sets and ) a poli y to de ide when a do ument ould be lassi ed as belonging to the relevant lass with a reasonable on den e. In this paper we propose an approa h that naturally opes with the rst two aspe ts and shows good perspe tives to deal with the last one. Our proposal, named temporal variation of terms (TVT) is based on using the variation of vo abulary along the di erent time steps as on ept spa e to represent the do uments. Results with the eRisk 2017 data set show a better performan e of TVT in omparison to other su essful semanti analysis approa hes and the standard BOW representation. Besides, it also rea hes the best reported results up to the moment for ERDE5 and ERDE50 error evaluation measures.