Improving recall values in breast cancer diagnosis with Incremental Background Knowledge

Cancer diagnosis is generally the process of using some form of physical or genetic tests or exams, usually referred as patient data, to detect the disease. One of the main problems with cancer diagnosis systems is the lack of labeled data, as well as the difficulties of labeling pre-existing unlabeled data. Thus, there is a growing interest in exploring the use of unlabeled data as a way to improve classification performance in cancer diagnosis. The possible availability of this kind of data for some applications makes it an appealing source of information. In this work we explore an Incremental Background Knowledge (IBK) technique to introduce unlabeled data into the training set by expanding it using initial classifiers to better aid decisions, namely by improving recall values. The defined incremental SVM margin-based method was tested in the Wisconsin-Madison breast cancer diagnosis problem to examine the effectiveness of such techniques in supporting diagnosis.

[1]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[2]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[3]  David B. Fogel,et al.  Linear and neural models for classifying breast masses , 1998, IEEE Transactions on Medical Imaging.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[6]  Fabrizio Sebastiani,et al.  A Tutorial on Automated Text Categorisation , 2000 .

[7]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Haym Hirsh,et al.  Using LSI for text classification in the presence of background text , 2001, CIKM '01.

[10]  Sung-Bae Cho,et al.  Incremental support vector machine for unlabeled data classification , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[11]  Janice Barlow,et al.  Risk factors for breast cancer in a population with high incidence rates , 2003, Breast Cancer Research.

[12]  Bernardete Ribeiro,et al.  Learning Adaptive Kernels for Model Diagnosis , 2003, HIS.

[13]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[14]  C. Silva,et al.  Labeled and unlabeled data in text categorization , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[15]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[16]  Li Sheng,et al.  A Clustering-Based Approach to Predict Outcome in Cancer Patients , 2007, ICMLA 2007.

[17]  Bernardete Ribeiro,et al.  On Text-based Mining with Active Learning and Background Knowledge Using SVM , 2007, Soft Comput..

[18]  A Clustering-Based Approach to Predict Outcome in Cancer Patients , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[19]  Rodica Strungaru,et al.  A Self Organizing Map approach to breast cancer detection , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[20]  Bernardete Ribeiro,et al.  Improving Text Classification Performance with Incremental Background Knowledge , 2009, ICANN.

[21]  Gabriela Alexe,et al.  Towards Improved Cancer Diagnosis and Prognosis Using Analysis of Gene Expression Data and Computer Aided Imaging , 2009, Experimental biology and medicine.

[22]  Mehmet Fatih Akay,et al.  Support vector machines combined with feature selection for breast cancer diagnosis , 2009, Expert Syst. Appl..

[23]  Matthias Seeger,et al.  Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.