Spatial and temporal epidemiological analysis in the Big Data era

Abstract Concurrent with global economic development in the last 50 years, the opportunities for the spread of existing diseases and emergence of new infectious pathogens, have increased substantially. The activities associated with the enormously intensified global connectivity have resulted in large amounts of data being generated, which in turn provides opportunities for generating knowledge that will allow more effective management of animal and human health risks. This so-called Big Data has, more recently, been accompanied by the Internet of Things which highlights the increasing presence of a wide range of sensors, interconnected via the Internet. Analysis of this data needs to exploit its complexity, accommodate variation in data quality and should take advantage of its spatial and temporal dimensions, where available. Apart from the development of hardware technologies and networking/communication infrastructure, it is necessary to develop appropriate data management tools that make this data accessible for analysis. This includes relational databases, geographical information systems and most recently, cloud-based data storage such as Hadoop distributed file systems. While the development in analytical methodologies has not quite caught up with the data deluge, important advances have been made in a number of areas, including spatial and temporal data analysis where the spectrum of analytical methods ranges from visualisation and exploratory analysis, to modelling. While there used to be a primary focus on statistical science in terms of methodological development for data analysis, the newly emerged discipline of data science is a reflection of the challenges presented by the need to integrate diverse data sources and exploit them using novel data- and knowledge-driven modelling methods while simultaneously recognising the value of quantitative as well as qualitative analytical approaches. Machine learning regression methods, which are more robust and can handle large datasets faster than classical regression approaches, are now also used to analyse spatial and spatio-temporal data. Multi-criteria decision analysis methods have gained greater acceptance, due in part, to the need to increasingly combine data from diverse sources including published scientific information and expert opinion in an attempt to fill important knowledge gaps. The opportunities for more effective prevention, detection and control of animal health threats arising from these developments are immense, but not without risks given the different types, and much higher frequency, of biases associated with these data.

[1]  K. Wilson,et al.  Protecting global health security through the International Health Regulations: requirements and challenges , 2008, Canadian Medical Association Journal.

[2]  Jia You,et al.  Artificial intelligence. DARPA sets out to automate research. , 2015, Science.

[3]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[4]  Max Kuhn,et al.  Who's afraid of the big black box?: Statisticians' vital role in big data and predictive modelling , 2014 .

[5]  Paul C Hanson,et al.  Staying afloat in the sensor data deluge. , 2012, Trends in ecology & evolution.

[6]  J. Farrar,et al.  Combined high-resolution genotyping and geospatial analysis reveals modes of endemic urban typhoid fever transmission , 2011, Open Biology.

[7]  B. Martínez-López,et al.  Identification of Suitable Areas for African Horse Sickness Virus Infections in Spanish Equine Populations. , 2016, Transboundary and emerging diseases.

[8]  C J Rutten,et al.  Invited review: sensors to support health management on dairy farms. , 2013, Journal of dairy science.

[9]  D. Pfeiffer,et al.  Application of knowledge-driven spatial modelling approaches and uncertainty management to a study of Rift Valley fever in Africa , 2006, International journal of health geographics.

[10]  Jean-Philippe Waaub,et al.  Spatially explicit multi-criteria decision analysis for managing vector-borne diseases , 2011, International journal of health geographics.

[11]  S. Hay,et al.  Guest editors' preface. Global mapping of infectious diseases: methods, examples and emerging applications. , 2006, Advances in parasitology.

[12]  James H. Faghmous,et al.  A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science , 2014, Big Data.

[13]  D. Pfeiffer,et al.  Spatial multi-criteria decision analysis to predict suitability for African swine fever endemicity in Africa , 2014, BMC Veterinary Research.

[14]  Eric Mykhalovskiy,et al.  The Global Public Health Intelligence Network and early warning outbreak detection: a Canadian contribution to global public health. , 2006, Canadian journal of public health = Revue canadienne de sante publique.

[15]  Navneet K Dhand,et al.  The importance of location in contact networks: Describing early epidemic spread using spatial social network analysis. , 2011, Preventive veterinary medicine.

[16]  E. Mykhalovskiy,et al.  The Global Public Health Intelligence Network and Early Warning Outbreak Detection , 2006 .

[17]  Jeannette M. Wing Computational thinking and thinking about computing , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[18]  David L. Smith,et al.  Mapping the zoonotic niche of Ebola virus disease in Africa , 2014, eLife.

[19]  Andrew J Tatem,et al.  Integrating rapid risk mapping and mobile phone call record data for strategic malaria elimination planning , 2014, Malaria Journal.

[20]  Maged N. Kamel Boulos,et al.  On the Internet of Things, smart cities and the WHO Healthy Cities , 2014, International Journal of Health Geographics.

[21]  Leandro Lorenzelli,et al.  Recent sensing technologies for pathogen detection in milk: a review. , 2014, Biosensors & bioelectronics.

[22]  M. Gilbert,et al.  Spatial Distribution and Risk Factors of Highly Pathogenic Avian Influenza (HPAI) H5N1 in China , 2011, PLoS pathogens.

[23]  Michael J. Ryan,et al.  Rumors of disease in the global village: outbreak verification. , 2000, Emerging infectious diseases.

[24]  Dylan B. George,et al.  Big Data Opportunities for Global Infectious Disease Surveillance , 2013, PLoS medicine.

[25]  Andrew B. Lawson,et al.  Hierarchical modeling in spatial epidemiology , 2014 .

[26]  Landon Fridman Detwiler,et al.  Visualization and analytics tools for infectious disease epidemiology: A systematic review , 2014, J. Biomed. Informatics.

[27]  Roy D. Sleator,et al.  'Big data', Hadoop and cloud computing in genomics , 2013, J. Biomed. Informatics.

[28]  Henrique N. Cabral,et al.  Predicting fish species richness in estuaries: Which modelling technique to use? , 2015, Environ. Model. Softw..

[29]  E. Schadt The changing privacy landscape in the era of big data , 2012, Molecular systems biology.

[30]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[31]  Massimo Pigliucci,et al.  The end of theory in science? , 2009, EMBO reports.

[32]  Navneet K Dhand,et al.  Adding the spatial dimension to the social network analysis of an epidemic: Investigation of the 2007 outbreak of equine influenza in Australia , 2012, Preventive Veterinary Medicine.

[33]  D D Darshan,et al.  Clinical study to know the efficacy of Amlexanox 5% with other topical Antiseptic, Analgesic and Anesthetic agents in treating minor RAS. , 2014, Journal of international oral health : JIOH.

[34]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[35]  S. P. Anderson,et al.  Critical Zone Observatories: Building a network to advance interdisciplinary study of Earth surface processes , 2008, Mineralogical Magazine.

[36]  A. Tatem,et al.  Commentary: Containing the Ebola Outbreak - the Potential and Challenge of Mobile Network Data , 2014, PLoS currents.

[37]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[38]  Jay Lee,et al.  Service Innovation and Smart Analytics for Industry 4.0 and Big Data Environment , 2014 .

[39]  D. Rogers,et al.  Spatial risk assessment and management of disease , 2008 .

[40]  Gennady L. Andrienko,et al.  Visual analytics of movement: An overview of methods, tools and procedures , 2013, Inf. Vis..

[41]  J. Cox,et al.  Quantifying travel behavior for infectious disease research: a comparison of data from surveys and mobile phones , 2014, Scientific Reports.

[42]  J. Paul,et al.  Using secondary data. , 2014, World health & population.

[43]  Margaret A. Oliver,et al.  A tutorial guide to geostatistics: Computing and modelling variograms and kriging , 2014 .

[44]  Rumi Chunara,et al.  Why We Need Crowdsourced Data in Infectious Disease Surveillance , 2013, Current Infectious Disease Reports.

[45]  Jin Li,et al.  Spatial interpolation methods applied in the environmental sciences: A review , 2014, Environ. Model. Softw..

[46]  Atsuyuki Okabe,et al.  Spatial Analysis Along Networks: Statistical and Computational Methods , 2012 .

[47]  Connie St Louis,et al.  Can Twitter predict disease outbreaks? , 2012, BMJ : British Medical Journal.

[48]  J. Brownstein,et al.  Early detection of disease outbreaks using the Internet , 2009, Canadian Medical Association Journal.

[49]  M. Zeldenrust,et al.  The value of ProMED-mail for the Early Warning Committee in the Netherlands: more specific approach recommended. , 2008, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[50]  Michael F. Goodchild,et al.  Assuring the quality of volunteered geographic information , 2012 .

[51]  Martin Kulldorff,et al.  Relative risk estimates from spatial and space–time scan statistics: are they biased? , 2014, Statistics in medicine.

[52]  J. Brownstein,et al.  Digital disease detection--harnessing the Web for public health surveillance. , 2009, The New England journal of medicine.

[53]  Emil Jovanov,et al.  Guest Editorial Introduction to the Special Section on M-Health: Beyond Seamless Mobility and Global Wireless Health-Care Connectivity , 2004, IEEE Transactions on Information Technology in Biomedicine.

[54]  Declan Butler,et al.  Mashups mix data into global service , 2006, Nature.

[55]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[56]  Nicola Jones,et al.  Computer science: The learning machines , 2014, Nature.

[57]  Olac Fuentes,et al.  Harnessing the power of big data: infusing the scientific method with machine learning to transform ecology , 2014 .

[58]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[59]  N. Wilson,et al.  Interpreting Google flu trends data for pandemic H1N1 influenza: the New Zealand experience. , 2009, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[60]  G. Eysenbach What is e-health? , 2001, Journal of Medical Internet Research.

[61]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[62]  M. Hugh-jones,et al.  Ecological Niche Modeling of Bacillus anthracis on Three Continents: Evidence for Genetic-Ecological Divergence? , 2013, PloS one.

[63]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[64]  Paul A. Fontelo,et al.  Scanning the Emerging Infectious Diseases Horizon - Visualizing ProMED Emails Using EpiSPIDER , 2007 .

[65]  Elizabeth Gibney,et al.  Game-playing software holds lessons for neuroscience , 2015, Nature.

[66]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[67]  M. Carrel,et al.  Genetics: A New Landscape for Medical Geography , 2013, Annals of the Association of American Geographers. Association of American Geographers.

[68]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[69]  Dirk U. Pfeiffer,et al.  Spatial modelling of disease using data- and knowledge-driven approaches. , 2011, Spatial and spatio-temporal epidemiology.

[70]  Yakov Ben-Haim,et al.  A New Multicriteria Risk Mapping Approach Based on a Multiattribute Frontier Concept , 2013, Risk analysis : an official publication of the Society for Risk Analysis.

[71]  Bernhard Schölkopf,et al.  Artificial intelligence: Learning to see and act , 2015, Nature.

[72]  Russ Burtner,et al.  INTERNATIONAL JOURNAL OF HEALTH GEOGRAPHICS REVIEW Open Access , 2022 .

[73]  Thais R Correa,et al.  A critical look at prospective surveillance using a scan statistic. , 2015, Statistics in medicine.

[74]  J. Elith,et al.  Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models , 2009 .

[75]  M. Yunus,et al.  The Simultaneous Effects of Spatial and Social Networks on Cholera Transmission , 2011, Interdisciplinary perspectives on infectious diseases.

[76]  R. Blanton Handbook of Helminthiasis for Public Health , 2007, Emerging Infectious Diseases.

[77]  G. Rodier,et al.  Hot spots in a wired world: WHO surveillance of emerging and re-emerging infectious diseases. , 2001, The Lancet. Infectious diseases.

[78]  Son Doan,et al.  BioCaster: detecting public health rumors with a Web-based text mining system , 2008, Bioinform..

[79]  Eleftherios Mylonakis,et al.  Google trends: a web-based tool for real-time surveillance of disease outbreaks. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[80]  D. Chessel,et al.  ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE HABITAT-SUITABILITY MAPS WITHOUT ABSENCE DATA? , 2002 .

[81]  M. Gilbert,et al.  Modeling habitat suitability for occurrence of highly pathogenic avian influenza virus H5N1 in domestic poultry in Asia: a spatial multicriteria decision analysis approach. , 2013, Spatial and spatio-temporal epidemiology.

[82]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[83]  Carl F. Salk,et al.  Comparing the Quality of Crowdsourced Data Contributed by Expert and Non-Experts , 2013, PloS one.

[84]  Roberto Di Pietro,et al.  Smart health: A context-aware health paradigm within smart cities , 2014, IEEE Communications Magazine.

[85]  Martin Kulldorff,et al.  Maximum linkage space-time permutation scan statistics for disease outbreak detection , 2014, International Journal of Health Geographics.

[86]  Jacek Malczewski,et al.  GIS‐based multicriteria decision analysis: a survey of the literature , 2006, Int. J. Geogr. Inf. Sci..

[87]  R. Biek,et al.  Integrating the landscape epidemiology and genetics of RNA viruses: rabies in domestic dogs as a model , 2012, Parasitology.

[88]  R. Kitchin,et al.  Crowdsourced Cartography: Mapping Experience and Knowledge , 2013 .

[89]  Global Mapping of Infectious Diseases: Methods, Examples, and Emerging Applications , 2007, Emerging Infectious Diseases.

[90]  Christian Heipke,et al.  Crowdsourcing geospatial data , 2010 .

[91]  J. Brownstein,et al.  Surveillance Sans Frontières: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project , 2008, PLoS medicine.

[92]  J S Brownstein,et al.  An overview of internet biosurveillance. , 2013, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[93]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[94]  C Ippoliti,et al.  A geographical information system-based multicriteria evaluation to map areas at risk for Rift Valley fever vector-borne transmission in Italy. , 2013, Transboundary and emerging diseases.

[95]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[96]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[97]  Andreas Ziegler,et al.  Mining data with random forests: current options for real‐world applications , 2014, WIREs Data Mining Knowl. Discov..

[98]  Yang Liu,et al.  Combining Spatial-Temporal and Phylogenetic Analysis Approaches for Improved Understanding on Global H5N1 Transmission , 2010, PloS one.

[99]  Donald Kaye,et al.  Evaluation of ProMED-mail as an electronic early warning system for emerging animal diseases: 1996 to 2004. , 2006, Journal of the American Veterinary Medical Association.

[100]  María José del Jesús,et al.  Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks , 2014, WIREs Data Mining Knowl. Discov..

[101]  Elizabeth Gibney DeepMind algorithm beats people at classic video games. , 2015 .

[102]  H T Sorensen,et al.  A framework for evaluation of secondary data sources for epidemiological research. , 1996, International journal of epidemiology.

[103]  Catherine Linard,et al.  Predicting the risk of avian influenza A H7N9 infection in live-poultry markets across Asia , 2014, Nature Communications.

[104]  Wenbiao Hu,et al.  Role of big data in the early detection of Ebola and other emerging infectious diseases. , 2015, The Lancet. Global health.

[105]  Thomas Blaschke,et al.  A GIS based spatially-explicit sensitivity and uncertainty analysis approach for multi-criteria decision analysis☆ , 2014, Comput. Geosci..

[106]  D. Alvarado-Serrano,et al.  Ecological niche models in phylogeographic studies: applications, advances and precautions , 2014, Molecular ecology resources.

[107]  Martin L Hazelton,et al.  Generalizing the spatial relative risk function. , 2014, Spatial and spatio-temporal epidemiology.

[108]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[109]  Ranga Raju Vatsavai,et al.  Spatiotemporal data mining in the era of big spatial data: algorithms and applications , 2012, BigSpatial '12.

[110]  Forrest W. Crawford,et al.  Unifying the spatial epidemiology and molecular evolution of emerging epidemics , 2012, Proceedings of the National Academy of Sciences.

[111]  Herman D. Tolentino,et al.  Use of Unstructured Event-Based Reports for Global Infectious Disease Surveillance , 2009, Emerging infectious diseases.

[112]  Alan T. Murray,et al.  Spatially significant cluster detection , 2014 .

[113]  Edzer Pebesma,et al.  An exploratory approach to spatial decision support , 2014, Comput. Environ. Urban Syst..

[114]  M. Goodchild Citizens as sensors: the world of volunteered geography , 2007 .

[115]  Gail M Williams,et al.  Internet-based surveillance systems for monitoring emerging infectious diseases , 2013, The Lancet Infectious Diseases.

[116]  David L. Buckeridge,et al.  Information technology and global surveillance of cases of 2009 H1N1 influenza. , 2010, The New England journal of medicine.

[117]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[118]  Marc A Suchard,et al.  Toward a quantitative understanding of viral phylogeography. , 2011, Current opinion in virology.

[119]  Rachel Schutt,et al.  Doing Data Science , 2013 .

[120]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[121]  A. Hirzel,et al.  Evaluating the ability of habitat suitability models to predict species presences , 2006 .

[122]  Andrew J. Tatem,et al.  Mapping population and pathogen movements. , 2014, International health.

[123]  Jason L. Brown,et al.  Integrating statistical genetic and geospatial methods brings new power to phylogeography. , 2011, Molecular phylogenetics and evolution.

[124]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[125]  Jacek Malczewski,et al.  Multiple Criteria Decision Analysis and Geographic Information Systems , 2010, Trends in Multiple Criteria Decision Analysis.

[126]  Arika Ligmann-Zielinska,et al.  Spatially-explicit integrated uncertainty and sensitivity analysis of criteria weights in multicriteria land suitability evaluation , 2014, Environ. Model. Softw..

[127]  T Van Zyl Machine learning on geospatial big data , 2014 .