Applying support vector machines to breast cancer diagnosis using screen film mammogram data

This paper explores the use of different support vector machines (SVM) kernels, and combinations of kernels, to ascertain the diagnostic accuracy of a screen film mammogram data set containing /spl cong/ 2500 samples from five different institutions. This research has demonstrated that: (1) specificity improves, on the average, of about 4% at 100% sensitivity and about 18%, on the average, at 98% sensitivity. This means that approximately 52 and 134 women would not have to undergo biopsy, at 100% and 98% sensitivity, when compared to the case of every women being biopsied, which would be necessary to identify all cancers in the absence of a computer aided diagnostic (CAD) process, (2) positive predictive value (PPV) at these same values of sensitivity are much better, ranging from 48% to 51 % as sensitivity is decreased from 100 to 98%. Finally, the average specificity over the top 10% or the ROC curve (which is the average specificity between 90-100% sensitivity) is about 30%. This means that, on the average, 440 women would not have to undergo biopsy, when compared to the case of all women being biopsied.