Quantitative Assessment of Estimation Approaches for Mining over Incomplete Data in Complex Biomedical Spaces: A Case Study on Cerebral Aneurysms

Biomedical data sources are typically compromised by fragmented data records. This incompleteness of data reduces the confidence gained from the application of mining algorithms. In this paper an approach to approximate missing data items is presented, which enables data mining processes to be applied on a larger data set. The proposed framework is based on a case-based reasoning infrastructure which is used to identify those data entries that are more appropriate to support the approximation of missing values. Moreover, the framework is evaluated in the context of a complex biomedical domain: intracranial cerebral aneurysms. The dataset used includes a wide diversity of advanced features obtained from clinical data, morphological analysis, and hemodynamic simulations. The best feature estimations achieved errors of only 7%. There are, however, large differences between the estimation accuracy achieved with different features.

[1]  Alejandro F Frangi,et al.  Morphological descriptors as rupture indicators in middle cerebral artery aneurysms , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[2]  Sander Scholtus,et al.  Handbook of Statistical Data Editing and Imputation , 2011 .

[3]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[4]  Ana Simonet,et al.  Dealing with Missing Values in a Probabilistic Decision Tree during Classification , 2006, ICDM Workshops.

[5]  Alejandro F. Frangi,et al.  @neurIST: Infrastructure for Advanced Disease Management Through Integration of Heterogeneous Data, Computing, and Complex Processing Services , 2010, IEEE Transactions on Information Technology in Biomedicine.

[6]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[7]  Matteo Magnani,et al.  Uncertainty in Decision Tree Classifiers , 2010, SUM.

[8]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[9]  Djamel A. Zighed,et al.  Mining Complex Data, ECML/PKDD 2007 Third International Workshop, MCD 2007, Warsaw, Poland, September 17-21, 2007, Revised Selected Papers , 2008, MCD.

[10]  Biao Qin,et al.  A Bayesian classifier for uncertain data , 2010, SAC '10.

[11]  Alejandro F. Frangi,et al.  Prediction of Cerebral Aneurysm Rupture Using Hemodynamic, Morphologic and Clinical Features: A Data Mining Approach , 2011, DEXA.

[12]  Charu C. Aggarwal,et al.  On the Use of Conceptual Reconstruction for Mining Massively Incomplete Data Sets , 2003, IEEE Trans. Knowl. Data Eng..

[13]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[14]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[15]  Biao Qin,et al.  DTU: A Decision Tree for Uncertain Data , 2009, PAKDD.

[16]  Sander Scholtus,et al.  Handbook of Statistical Data Editing and Imputation , 2011 .

[17]  Jeff Z. Pan,et al.  An Argument-Based Approach to Using Multiple Ontologies , 2009, SUM.

[18]  Ana Simonet,et al.  Dealing with Missing Values in a Probabilistic Decision Tree during Classification , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).