Interpreting PET scans by structured patient data: a data mining case study in dementia research

One of the goals of medical research in the area of dementia is to correlate images of the brain with clinical tests. Our approach is to start with the images and explain the differences and commonalities in terms of the other variables. First, we cluster Positron emission tomography (PET) scans of patients to form groups sharing similar features in brain metabolism. To the best of our knowledge, it is the first time ever that clustering is applied to whole PET scans. Second, we explain the clusters by relating them to non-image variables. To do so, we employ RSD, an algorithm for relational subgroup discovery, with the cluster membership of patients as target variable. Our results enable interesting interpretations of differences in brain metabolism in terms of demographic and clinical variables. The approach was implemented and tested on an exceptionally large data collection of patients with different types of dementia. It comprises 10 GB of image data from 454 PET scans, and 42 variables from psychological and demographical data organized in 11 relations of a relational database. We believe that explaining medical images in terms of other variables (patient records, demographic information, etc.) is a challenging new and rewarding area for data mining research.

[1]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[2]  Shinichi Morishita,et al.  Constrained clusters of gene expression profiles with pathological features , 2004, Bioinform..

[3]  Marco Zaffalon,et al.  Classification of Dementia Types from Cognitive Profiles Data , 2006, PKDD.

[4]  F Makedon,et al.  Statistical Methods in Medical Research Data Mining in Brain Imaging , 2022 .

[5]  Padhraic Smyth,et al.  Differential Diagnosis of Dementia: A Knowledge Discovery and Data Mining (KDD) Approach , 1997, AMIA.

[6]  Glenn Fung,et al.  SVM feature selection for classification of SPECT images of Alzheimer's disease using spatial information , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  A Drzezga,et al.  Schooling mediates brain reserve in Alzheimer’s disease: findings of fluoro-deoxy-glucose-positron emission tomography , 2006, Journal of Neurology, Neurosurgery & Psychiatry.

[8]  Christian Böhm,et al.  Density connected clustering with local subspace preferences , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  J. G. Kalbfleisch Probability and Statistical Inference , 1977 .

[10]  C. Ordonez,et al.  Constraining and summarizing association rules in medical data , 2006 .

[11]  Julio J. Valdés,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Data Mining of Gene Expression Changes in , 2003 .

[12]  Peter A. Flach,et al.  RSD: Relational Subgroup Discovery through First-Order Feature Construction , 2002, ILP.

[13]  Filip Železný RSD - Relational Subgroup Discovery , 2006 .

[14]  Nada Lavrac,et al.  Propositionalization-based relational subgroup discovery with RSD , 2006, Machine Learning.

[15]  Changyu Shen,et al.  Mining Alzheimer Disease Relevant Proteins from Integrated Protein Interactome Data , 2005, Pacific Symposium on Biocomputing.

[16]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[17]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[18]  Glenn Fung,et al.  SVM Feature Selection for Classification of SPECT Images of Alzheimer's Disease Using Spatial Information , 2005, ICDM.

[19]  M F Weiner,et al.  A total score for the CERAD neuropsychological battery , 2005, Neurology.