Lesion classification using clinical and visual data fusion by multiple kernel learning

To overcome operator dependency and to increase diagnosis accuracy in breast ultrasound (US), a lot of effort has been devoted to developing computer-aided diagnosis (CAD) systems for breast cancer detection and classification. Unfortunately, the efficacy of such CAD systems is limited since they rely on correct automatic lesions detection and localization, and on robustness of features computed based on the detected areas. In this paper we propose a new approach to boost the performance of a Machine Learning based CAD system, by combining visual and clinical data from patient files. We compute a set of visual features from breast ultrasound images, and construct the textual descriptor of patients by extracting relevant keywords from patients' clinical data files. We then use the Multiple Kernel Learning (MKL) framework to train SVM based classifier to discriminate between benign and malignant cases. We investigate different types of data fusion methods, namely, early, late, and intermediate (MKL-based) fusion. Our database consists of 408 patient cases, each containing US images, textual description of complaints and symptoms filled by physicians, and confirmed diagnoses. We show experimentally that the proposed MKL-based approach is superior to other classification methods. Even though the clinical data is very sparse and noisy, its MKL-based fusion with visual features yields significant improvement of the classification accuracy, as compared to the image features only based classifier.