Modeling Antimicrobial Prescriptions in Scotland: A Spatiotemporal Clustering Approach

In 2016 the British government acknowledged the importance of reducing antimicrobial prescriptions in order to avoid the long-term harmful effects of over-prescription. Prescription needs are highly dependent on factors that have a spatio-temporal component, such as the presence of a bacterial outbreak and the population density. In this context, density-based clustering algorithms are flexible tools to analyse data by searching for group structures. The case of Scotland presents an additional challenge due to the diversity of population densities under the area of study. We present here a spatio-temporal clustering approach for highlighting the behaviour of general practitioners (GPs) in Scotland. Particularly, we consider the density-based spatial clustering of applications with noise algorithm (DBSCAN) due to its ability to include both spatial and temporal data, as well as its flexibility to be extended with further variables. We extend this approach into two directions. For the temporal analysis, we use dynamic time warping to measure the dissimilarity between warped and shifted time series. For the spatial component, we introduce a new way of weighting spatial distances with continuous weights derived from a KDE-based process. This makes our approach suitable for cases involving spatial clusters with differing densities, which is a well-known issue for the original DBSCAN. We show an improved performance compared to both the latter and the popular k-means algorithm on simulated, as well as empirical data, presenting evidence for the ability to cluster more elements correctly and deliver actionable insights.

[1]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[2]  Valéria Cesário Times,et al.  DB-SMoT: A direction-based spatio-temporal clustering method , 2010, 2010 5th IEEE International Conference Intelligent Systems.

[3]  Philip S. Yu,et al.  Early classification on time series , 2012, Knowledge and Information Systems.

[4]  Jun Zhang The mean field theory in EM procedures for Markov random fields , 1992, IEEE Trans. Signal Process..

[5]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[6]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[7]  F. Tanser,et al.  Spatial clustering of drug-resistant tuberculosis in Hlabisa subdistrict, KwaZulu-Natal, 2011–2015 , 2018, The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[10]  L. Valinsky,et al.  Near real-time space-time cluster analysis for detection of enteric disease outbreaks in a community setting. , 2016, The Journal of infection.

[11]  Deniz Yuret,et al.  Locally Scaled Density Based Clustering , 2007, ICANNGA.

[12]  Min Wang,et al.  Mining Spatial-temporal Clusters from Geo-databases , 2006, ADMA.

[13]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[14]  Ali Kamandi,et al.  SW-DBSCAN: A Grid-based DBSCAN Algorithm for Large Datasets , 2020, 2020 6th International Conference on Web Research (ICWR).

[15]  Yunchuan Sun,et al.  Adaptive fuzzy clustering by fast search and find of density peaks , 2015, 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI).

[16]  L. C. Matioli,et al.  A new algorithm for clustering based on kernel density estimation , 2018 .

[17]  José G. Dias,et al.  Mining categorical sequences from data using a hybrid clustering method , 2014, Eur. J. Oper. Res..

[18]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Spectral methods for graph clustering - A survey , 2011, Eur. J. Oper. Res..

[19]  Berk Anbaroglu,et al.  Spatio-temporal clustering for non-recurrent traffic congestion detection on urban road networks , 2013 .

[20]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[21]  Fabrizio Durante,et al.  Copula–based clustering methods , 2017 .

[22]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[23]  V. Fraser,et al.  Outpatient Antibiotic Prescription Trends in the United States: A National Cohort Study , 2018, Infection Control & Hospital Epidemiology.

[24]  Meelee Kim,et al.  Usefulness of prescription monitoring programs for surveillance—analysis of Schedule II opioid prescription data in Massachusetts, 1996–2006 , 2010, Pharmacoepidemiology and drug safety.

[25]  Alexander Mendiburu,et al.  Distance Measures for Time Series in R: The TSdist Package , 2016, R J..

[26]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[27]  Luis Gravano,et al.  Fast and Accurate Time-Series Clustering , 2017, ACM Trans. Database Syst..

[28]  Peter B. McCrory,et al.  A Cup Runneth Over: Fiscal Policy Spillovers from the 2009 Recovery Act , 2018 .

[29]  Hans-Peter Kriegel,et al.  Density‐based clustering , 2011, WIREs Data Mining Knowl. Discov..

[30]  José G. Dias,et al.  Clustering financial time series: New insights from an extended hidden Markov model , 2015, Eur. J. Oper. Res..

[31]  John D. Morgan,et al.  Spatial Cluster Analysis of High-Density Vehicle–Bear Collisions and Bridge Locations , 2017 .

[32]  Luigi Portinale,et al.  Case-based retrieval to support the treatment of end stage renal failure patients , 2006, Artif. Intell. Medicine.

[33]  Joan Serrà,et al.  An empirical evaluation of similarity measures for time series classification , 2014, Knowl. Based Syst..

[34]  Chirag B. Mistry,et al.  Antibiotic prescribing patterns in general medical practices in England: Does area matter? , 2018, Health & place.

[35]  Richard J. Povinelli,et al.  Time series classification using Gaussian mixture models of reconstructed phase spaces , 2004, IEEE Transactions on Knowledge and Data Engineering.

[36]  Masashi Sugiyama,et al.  Introduction to Statistical Machine Learning , 2015 .

[37]  D. Rodriguez,et al.  The Spatio-temporal Clustering of Green Buildings in the United States , 2013 .

[38]  Sanjay Garg,et al.  Development and validation of OPTICS based spatio-temporal clustering technique , 2016, Inf. Sci..

[40]  Erik B. Sudderth Introduction to statistical machine learning , 2016 .

[41]  SunYunchuan,et al.  Adaptive fuzzy clustering by fast search and find of density peaks , 2016 .

[42]  Naixue Xiong,et al.  Spatio-Temporal Vessel Trajectory Clustering Based on Data Mapping and Density , 2018, IEEE Access.

[43]  Matthew S. Gerber,et al.  Predicting crime using Twitter and kernel density estimation , 2014, Decis. Support Syst..

[44]  Kwan-Hee Yoo,et al.  AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities , 2018, The Journal of Supercomputing.

[45]  J. Stewart,et al.  Risk of Antibiotic‐Resistant Staphylococcus aureus Dispersion from Hog Farms: A Critical Review , 2020, Risk analysis : an official publication of the Society for Risk Analysis.

[46]  Justo Puerto,et al.  Clustering data that are graph connected , 2017, Eur. J. Oper. Res..

[47]  Peijun Shi,et al.  Spatial Vulnerability of Network Systems under Spatially Local Hazards , 2019, Risk analysis : an official publication of the Society for Risk Analysis.

[48]  Toni Giorgino,et al.  Matching incomplete time series with dynamic time warping: an algorithm and an application to post-stroke rehabilitation , 2009, Artif. Intell. Medicine.

[49]  Witold Pedrycz,et al.  Fuzzy clustering of time series data using dynamic time warping distance , 2015, Eng. Appl. Artif. Intell..

[50]  Kristian Sabo,et al.  DBSCAN-like clustering method for various data densities , 2019, Pattern Analysis and Applications.

[51]  Tasha R. Inniss Seasonal clustering technique for time series data , 2006, Eur. J. Oper. Res..

[52]  Tessa K Anderson,et al.  Kernel density estimation and K-means clustering to profile road accident hotspots. , 2009, Accident; analysis and prevention.

[53]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[54]  D. W. Scott,et al.  Variable Kernel Density Estimation , 1992 .

[55]  S. Bougeard,et al.  Spatiotemporal clustering and Random Forest models to identify risk factors of African swine fever outbreak in Romania in 2018–2019 , 2021, Scientific Reports.

[56]  J. Anuradha,et al.  Clustering West Nile Virus Spatio-temporal data using ST-DBSCAN , 2018 .

[57]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[58]  Craig Anderson,et al.  Bayesian cluster detection via adjacency modelling. , 2016, Spatial and spatio-temporal epidemiology.

[59]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[60]  T. Gasser,et al.  Alignment of curves by dynamic time warping , 1997 .

[61]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[62]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[63]  Alberto Bernacchia,et al.  Self‐consistent method for density estimation , 2009, 0908.3856.

[64]  Gilberto Montibeller,et al.  An Integrated Framework for Environmental Multi‐Impact Spatial Risk Analysis , 2019, Risk analysis : an official publication of the Society for Risk Analysis.

[65]  M. Cameletti,et al.  Two-stage Bayesian model to evaluate the effect of air pollution on chronic respiratory diseases using drug prescriptions. , 2016, Spatial and spatio-temporal epidemiology.

[66]  C. Biernacki,et al.  Model-based clustering of Gaussian copulas for mixed data , 2014, 1405.1299.

[67]  Eugene Brusilovskiy,et al.  Using global positioning systems to study health-related mobility and participation. , 2016, Social science & medicine.

[68]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[69]  Charles E. Catlett,et al.  Spatio-temporal crime predictions in smart cities: A data-driven approach and experiments , 2019, Pervasive Mob. Comput..

[70]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[71]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[72]  Pierpaolo D'Urso,et al.  Copula-based fuzzy clustering of spatial time series , 2017 .

[73]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[74]  Anne M Johnson,et al.  Protective effect of antibiotics against serious complications of common respiratory tract infections: retrospective cohort study with the UK General Practice Research Database , 2007, BMJ : British Medical Journal.

[75]  Eleni I. Vlahogianni,et al.  Identifying spatio-temporal patterns of bus bunching in urban networks , 2020, J. Intell. Transp. Syst..

[76]  Roberto Marcondes Cesar Junior,et al.  Inference from Clustering with Application to Gene-Expression Microarrays , 2002, J. Comput. Biol..

[77]  Francisco J Zagmutt,et al.  The Impact of Population, Contact, and Spatial Heterogeneity on Epidemic Model Predictions. , 2016, Risk analysis : an official publication of the Society for Risk Analysis.

[78]  G. Schüpbach-Regula,et al.  Risk Ranking of Antimicrobial‐Resistant Hazards Found in Meat in Switzerland , 2018, Risk analysis : an official publication of the Society for Risk Analysis.

[79]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[80]  Daniel A. Keim,et al.  A framework for using self-organising maps to analyse spatio-temporal patterns, exemplified by analysis of mobile phone usage , 2010, J. Locat. Based Serv..

[81]  Ana Arribas-Gil,et al.  Pairwise dynamic time warping for event data , 2012, Comput. Stat. Data Anal..

[82]  Adele H. Marshall,et al.  Using simulation to assess cardiac first-responder schemes exhibiting stochastic and spatial complexities , 2011, J. Oper. Res. Soc..

[83]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[84]  Alessandra Menafoglio,et al.  Statistical analysis of complex and spatially dependent data: A review of Object Oriented Spatial Statistics , 2017, Eur. J. Oper. Res..

[85]  Luca Vogt Statistics For Spatial Data , 2016 .

[86]  Blas Mola-Yudego,et al.  Different Factors for Different Causes: Analysis of the Spatial Aggregations of Fire Ignitions in Catalonia (Spain) , 2015, Risk analysis : an official publication of the Society for Risk Analysis.

[87]  Rongfang Bie,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016, Neurocomputing.

[88]  R. Tibshirani,et al.  Monographs on statistics and applied probability , 1990 .

[89]  BieRongfang,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016 .

[90]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[91]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  Peter Gritzmann,et al.  An LP-based k-means algorithm for balancing weighted point sets , 2017, Eur. J. Oper. Res..

[93]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[94]  Michael J. Fry,et al.  Model-based capacitated clustering with posterior regularization , 2018, Eur. J. Oper. Res..

[95]  D.K. Bhattacharyya,et al.  An improved sampling-based DBSCAN for large spatial databases , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[96]  A. Ersbøll,et al.  Geographical clustering of incident acute myocardial infarction in Denmark: A spatial analysis approach. , 2016, Spatial and spatio-temporal epidemiology.

[97]  Refael Hassin,et al.  Min sum clustering with penalties , 2010, Eur. J. Oper. Res..

[98]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[99]  Pragya Agarwal,et al.  Self-Organising Maps , 2008 .

[100]  Chenghu Zhou,et al.  DECODE: a new method for discovering clusters of different densities in spatial data , 2009, Data Mining and Knowledge Discovery.

[101]  David W. Scott,et al.  Sturges' rule , 2009 .