Abstract Social media are becoming widely used in the healthcare field as a patients-caregivers communication tool giving birth to new sources of information rich with the knowledge that may improve this field. Therefore, social media data analysis becomes a real business requirement for healthcare industrials and data scientists. However, regarding their complexity and unstructured character, existing natural language processing tools cannot succeed their exploitation. In the literature, a wide range of approaches appeared based on dictionaries, linguistic patterns and machine learning having their strengths and weaknesses. In this work, we propose a hybrid system combining the above approaches by taking the advantage of each of them to extract structured and salient drug abuse information from health-related tweets. We improve the system accuracy by real time update of the domain dictionary. We collected 1000000 tweets and we conducted different experiments showing the advantage of hybridization on efficient information extraction from social media data.
[1]
Bruno Grilhères,et al.
Combinaison d'approches pour l'extraction automatique d'événements (Automatic events extraction by combining multiple approaches) [in French]
,
2012,
JEP-TALN-RECITAL.
[2]
Khaled Shaalan,et al.
A hybrid approach to Arabic named entity recognition
,
2014,
J. Inf. Sci..
[3]
Salma Jamoussi,et al.
A hybrid method for extracting relations between Arabic named entities
,
2014,
J. King Saud Univ. Comput. Inf. Sci..
[4]
Abeed Sarker,et al.
Portable automatic text classification for adverse drug reaction detection via multi-corpus training
,
2015,
J. Biomed. Informatics.