Disease outbreaks have lingered in earth since ancient times. Since the flow of information was slow, the outbreak had a devastating effect. Nowadays we are living in a world where data is a valuable asset, although it is still not used wisely enough. With Medias, blogs and social networks popping everywhere, it is rather not hard to pinpoint the relationship between disease outbreaks and information sites. Our system works by using twitter data as its main source and other different news sources in making of our system. And as a use case the Ebola outbreak will be used for the analysis and building of a framework which will help in detecting outbreaks in any part of the world. The workflow of the system uses Bag of words, Tf-Idf and lexical features in building this model. Almost all of the process are automated starting from getting relevant tweets and clustering similar events together, using different work bots that have specific jobs and work together, automatically finding the optimal values which give the highest result, as well as analyzing languages such as English and French.
[1]
J. Brownstein,et al.
Digital disease detection--harnessing the Web for public health surveillance.
,
2009,
The New England journal of medicine.
[2]
Jeremy Ginsberg,et al.
Detecting influenza epidemics using search engine query data
,
2009,
Nature.
[3]
Avare Stewart,et al.
Why is it Difficult to Detect Sudden and Unexpected Epidemic Outbreaks in Twitter?
,
2016,
ArXiv.
[4]
Manu Konchady.
Text Mining Application Programming
,
2006
.
[5]
Gaël Varoquaux,et al.
Scikit-learn: Machine Learning in Python
,
2011,
J. Mach. Learn. Res..
[6]
Ewan Klein,et al.
Natural Language Processing with Python
,
2009
.