An Analysis of the Survivability in SEER Breast Cancer Data Using Association Rule Mining

Medical professionals need a reliable methodology to predict the survivability of patients with breast cancer. In this work, a classical association rule mining algorithm-Apriori was adopted for analyzing the related association relationship between medical attributes of records and the survivability of patients. The SEER Dataset was used in this research. After the dataset was preprocessed, 29606 records was obtained. Each record contains 17 breast cancer related attributes. Then apriori algorithm was applied in these preprocessed records, 326 association rules about ‘survived’ and 22 association rules about ‘not survived’ were obtained finally. These discovered association rules indicate that the attributes of EOD-Lymph Node Involv and SEER historic stage A play important roles in the survivability of patients after analyzed and compared.

[1]  J. Ferlay,et al.  Global Cancer Statistics, 2002 , 2005, CA: a cancer journal for clinicians.

[2]  Jarrett Rosenberg,et al.  The effect of age, race, tumor size, tumor grade, and disease stage on invasive ductal breast cancer survival in the U.S. SEER database , 2004, Breast Cancer Research and Treatment.

[3]  Erhan Guven,et al.  PREDICTING BREAST CANCER SURVIVABILITY USING DATA MINING TECHNIQUES , 2006 .

[4]  Sheila Anand,et al.  Analysis of SEER Dataset for Breast Cancer Diagnosis using C4.5 Classification Algorithm , 2012 .

[5]  A. Jemal,et al.  Global Cancer Statistics , 2011 .

[6]  Alok N. Choudhary,et al.  Identifying HotSpots in Lung Cancer Data Using Association Rule Mining , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[7]  Young Jin Kim,et al.  Estimation of non-linear deflection for cylinder under bending and its application to CANDU pressure tube integrity assessment , 2003 .

[8]  P. H. Sönksen,et al.  Data mining for indicators of early mortality in a database of clinical records , 2001, Artif. Intell. Medicine.

[9]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[10]  Liu Yin,et al.  An Application of Apriori Algorithm in SEER Breast Cancer Data , 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence.

[11]  Sam Lightstone,et al.  Data Mining - Know It All , 2008 .

[12]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13]  Hsinchun Chen,et al.  Medical Data Mining on the Internet: Research on a Cancer Information System , 1999, Artificial Intelligence Review.