Exploring Disease Association from the NHANES Data: Data Mining, Pattern Summarization, and Visual Analytics

Finding associations among different diseases is an important task in medical data mining. The NHANES data is a valuable source in exploring disease associations. However, existing studies analyzing the NHANES data focus on using statistical techniques to test a small number of hypotheses. This NHANES data has not been systematically explored for mining disease association patterns. In this regard, this paper proposes a direct disease pattern mining method and an interactive disease pattern mining method to explore the NHANES data. The results on the latest NHANES data demonstrate that these methods can mine meaningful disease associations consistent with the existing knowledge and literatures. Furthermore, this study provides summarization of the data set via a disease influence graph and a disease hierarchical tree.

[1]  Jie Chen,et al.  Mining risk patterns in medical data , 2005, KDD '05.

[2]  Yun Sing Koh,et al.  Finding Non-Coincidental Sporadic Rules Using Apriori-Inverse , 2006, Int. J. Data Warehous. Min..

[3]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[4]  E. Rimm,et al.  Frequency of analgesic use and risk of hypertension among men. , 2007, Archives of internal medicine.

[5]  Pascal Richard,et al.  Primary and Referential Horizontal Partitioning Selection Problems: Concepts, Algorithms and Advisor Tool , 2011, Integrations of Data Warehousing, Data Mining and Database Technologies.

[6]  Pedro Furtado A Survey of Parallel and Distributed Data Warehouses , 2009, Int. J. Data Warehous. Min..

[7]  P. Whelton,et al.  Risk factors for congestive heart failure in US men and women: NHANES I epidemiologic follow-up study. , 2001, Archives of internal medicine.

[8]  David Taniar,et al.  Integrations of Data Warehousing, Data Mining and Database Technologies - Innovative Approaches , 2011 .

[9]  E W Gunter,et al.  Exposure of the U.S. population to lead, 1991-1994. , 1998, Environmental health perspectives.

[10]  R. Panush Food induced ("allergic") arthritis: clinical and serologic studies. , 1990, The Journal of rheumatology.

[11]  Mohammad Saraee,et al.  Improving Similarity Search in Time Series Using Wavelets , 2006, Int. J. Data Warehous. Min..

[12]  Christophe Giraud-Carrier,et al.  Dependency Mining on the 2005-06 National Health and Nutrition Examination Survey Data , 2005 .

[13]  V. Preedy,et al.  National Health and Nutrition Examination Survey , 2010 .

[14]  Yang Xiang,et al.  Effective and efficient itemset pattern summarization: regression-based approaches , 2008, KDD.

[15]  Celia S. Chen,et al.  Juvenile arthritis-associated uveitis: visual outcomes and prognosis. , 2004, Canadian journal of ophthalmology. Journal canadien d'ophtalmologie.

[16]  Steven B Heymsfield,et al.  Inadequate sleep as a risk factor for obesity: analyses of the NHANES I. , 2005, Sleep.

[17]  Jian Pei,et al.  Interactive exploration of coherent patterns in time-series gene expression data , 2003, KDD '03.

[18]  David A. Gadish Introducing the Elasticity of Spatial Data , 2008, Int. J. Data Warehous. Min..

[19]  Deeb N Salem,et al.  Level of kidney function as a risk factor for atherosclerotic cardiovascular outcomes in the community. , 2003, Journal of the American College of Cardiology.

[20]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[21]  E. R. Sutherland,et al.  Overweight, obesity, and incident asthma: a meta-analysis of prospective epidemiologic studies. , 2007, American journal of respiratory and critical care medicine.

[22]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[23]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[24]  Alok J. Saldanha,et al.  Java Treeview - extensible visualization of microarray data , 2004, Bioinform..

[25]  Jiawei Han,et al.  Summarizing itemset patterns: a profile-based approach , 2005, KDD '05.

[26]  G. Colditz,et al.  The disease burden associated with overweight and obesity. , 1999, JAMA.

[27]  D. Huse,et al.  Consequences of increased systolic blood pressure in patients with osteoarthritis and rheumatoid arthritis. , 2003, The Journal of rheumatology.

[28]  Bin Luo,et al.  A Hybrid Method for High-Utility Itemsets Mining in Large High-Dimensional Data , 2009, Int. J. Data Warehous. Min..

[29]  Jean H Gerster Acute polyarthritis related to once-weekly alendronate in a woman with osteoporosis. , 2004, The Journal of rheumatology.