Finding Different Types of Medical Conditions: From Data Generation to Automatic Classification

• In our study, we collected 19,313 spontaneous patient reports about antibiotic treatment experiences from medical forums. We annotated 760 sentences containing different types of medical conditions, and classified 5,179 unique conditions into indications and side effects across the set of reports using natural language processing and machine learning techniques. • Each of the 760 annotated sentences were given a binary label indicating whether the condition played the role of an “indication” or “side effect” of the given treatment. Specific treatment and condition mentions in text were replaced by generic text labels (i.e. _TREATMENT, _CONDITION) to prevent overfitting to the antibiotic drug class. • We used an SVM classifier with 5-fold cross validation and averaged the outcomes of the folds to determine the F1 scores (see Figure 2).