Association Rule Learning and Frequent Sequence Mining of Cancer Diagnoses in New York State

Analyzing large scale diagnosis histories of patients could help to discover comorbidity or disease progression patterns. Recently, open data initiatives make it possible to access statewide patient data at individual level, such as New York State SPARCS data. The goal of this study is to explore frequent disease co-occurrence and sequence patterns of cancer patients in New York State using SPARCS data. Our collection includes 18,208,830 discharge records from 1,565,237 patients with cancer-related diagnoses during 2011–2015. We use Apriori algorithm to discover top disease co-occurrences for common cancer categories based on support. We generate top frequent sequences of diagnoses with at least one cancer related diagnosis from patients’ diagnosis histories using the cSPADE algorithm. Our data driven approach provides essential knowledge to support the investigation of disease co-occurrence and progression patterns for improving the management of multiple diseases.

[1]  Shannon Coy,et al.  Comparison of outcomes of patients with inpatient or outpatient onset ischemic stroke , 2016, Journal of NeuroInterventional Surgery.

[2]  Redi Rahmani,et al.  Surgical Clipping versus Endovascular Intervention for the Treatment of Subarachnoid Hemorrhage Patients in New York State , 2015, PloS one.

[3]  C. Ahmad,et al.  Epidemiology of Medial Ulnar Collateral Ligament Reconstruction , 2016, The American journal of sports medicine.

[4]  Benjamin Littenberg,et al.  Exploring Generalized Association Rule Mining for Disease Co-Occurrences , 2012, AMIA.

[5]  David A. Hanauer,et al.  Data Mining for Identifying Novel Associations and Temporal Relationships with Charcot Foot , 2014, Journal of diabetes research.

[6]  Wenli Zhang,et al.  Predicting Asthma-Related Emergency Department Visits Using Big Data , 2015, IEEE Journal of Biomedical and Health Informatics.

[7]  Shannon Coy,et al.  Scope of practice and outcomes of cerebrovascular procedures in children , 2016, Child's Nervous System.

[8]  Michael H Smolensky,et al.  Temporal Patterns of In-Hospital Falls of Elderly Patients , 2016, Nursing research.

[9]  Fusheng Wang,et al.  Integrative Spatial Data Analytics for Public Health Studies of New York State , 2016, AMIA.

[10]  Hongxing He,et al.  Feature Selection for Temporal Health Records , 2001, PAKDD.

[11]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[12]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[13]  Kari Ferver,et al.  The use of claims data in healthcare research. , 2009 .

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  D. Weiss,et al.  Completeness of Neisseria meningitidis reporting in New York City, 1989–2010 , 2016, Epidemiology and Infection.

[16]  Mehmet A. Orgun,et al.  Mining Temporal Patterns from Health Care Data , 2002, DaWaK.

[17]  Joel H. Saltz,et al.  Spatio-temporal Analysis for New York State SPARCS Data , 2017, CRI.

[18]  Bonnie K. Lind,et al.  Challenges of Using Medical Insurance Claims Data for Utilization Analysis , 2006, American journal of medical quality : the official journal of the American College of Medical Quality.

[19]  R. Silverman,et al.  Effect of Hurricane Sandy on Long Island Emergency Departments Visits , 2016, Disaster Medicine and Public Health Preparedness.

[20]  Stephen Lyman,et al.  Racial and Socioeconomic Disparities in Hip Fracture Care. , 2016, The Journal of bone and joint surgery. American volume.

[21]  Olga Stepánková,et al.  Sequential Data Mining: A Comparative Case Study in Development of Atherosclerosis Risk Factors , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Peter J. F. Lucas,et al.  Understanding the Co-occurrence of Diseases Using Structure Learning , 2013, AIME.

[23]  S. Missios,et al.  Regional disparities in hospitalization charges for patients undergoing craniotomy for tumor resection in New York State: correlation with outcomes , 2016, Journal of Neuro-Oncology.

[24]  Fei Wang,et al.  Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features , 2015, PloS one.

[25]  Sungwoo Lim,et al.  Temporal and Spatial Patterns in Utilization of Mental Health Services During and After Hurricane Sandy: Emergency Department and Inpatient Hospitalizations in New York City , 2016, Disaster Medicine and Public Health Preparedness.