Breast tumor classification using a new OWA operator

A method for breast tumor classification based on a new OWA operator is prosed.The new OWA operator is based on the Laplace distribution.The missing values of the data are handled using multiple imputation method. Breast cancer is the most common cancer among Canadian women and the second cause of death from cancer. Fine needle aspirate (FNA) is a technology used to investigate early breast tumors to detect cancer. In this paper, we demonstrate the application of a new ordered weighted averaging operator (OWA) to the problem of breast tumor classification. The OWA operator employs the Laplace distribution to calculate the weight vector to aggregate the uncertain information about the breast tumors. The aggregated information is used along with the tumor label, i.e., benign or malignant, to train a nearest neighbor, support vector machine, and logistic regression classifiers. The result of this study based on the nearest neighbor classifier achieves 99.71% accuracy that outperforms other studies that utilize other OWA operators using the same dataset.

[1]  Nicholas I. M. Gould,et al.  Preprocessing for quadratic programming , 2004, Math. Program..

[2]  Pinto Rafael,et al.  Breast Cancer Dataset , 2015 .

[3]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[4]  Hareton K. N. Leung,et al.  Hybrid $k$ -Nearest Neighbor Classifier , 2016, IEEE Transactions on Cybernetics.

[5]  Jung Hyun Yoon,et al.  Fine-Needle Aspirate CYFRA 21-1, an Innovative New Marker for Diagnosis of Axillary Lymph Node Metastasis in Breast Cancer Patients , 2015, Medicine.

[6]  Elena Hernández-Pereira,et al.  Automatic classification of respiratory patterns involving missing data imputation techniques , 2015 .

[7]  A. K. Anilkumar,et al.  Modeling of Sunspot Numbers by a Modified Binary Mixture of Laplace Distribution Functions , 2008 .

[8]  Francisco Herrera,et al.  Dealing with Missing Values , 2015 .

[9]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[10]  Paulo J. G. Lisboa,et al.  A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients , 2010, Comput. Biol. Medicine.

[11]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[12]  Arianna Mencattini,et al.  Mammographic Images Enhancement and Denoising for Breast Cancer Detection Using Dyadic Wavelet Processing , 2008, IEEE Transactions on Instrumentation and Measurement.

[13]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[14]  Samuel Kotz,et al.  The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance , 2001 .

[15]  J. Dheeba,et al.  Computer-aided detection of breast cancer on mammograms: A swarm intelligence optimized wavelet neural network approach , 2014, J. Biomed. Informatics.

[16]  Zeshui Xu,et al.  An overview of methods for determining OWA weights , 2005, Int. J. Intell. Syst..

[17]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[18]  Zeshui Xu,et al.  Alternative form of Dempster's rule for binary variables: Research Articles , 2005 .

[19]  Rudy Setiono,et al.  Generating concise and accurate classification rules for breast cancer diagnosis , 2000, Artif. Intell. Medicine.

[20]  Dai Min,et al.  A Note on OWA Operator Based on the Normal Distribution , 2007, 2007 International Conference on Management Science and Engineering.

[21]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[22]  Ayman M. Eldeib,et al.  Breast cancer classification using deep belief networks , 2016, Expert Syst. Appl..

[23]  J. Haukoos,et al.  Advanced statistics: bootstrapping confidence intervals for statistics with "difficult" distributions. , 2005, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[24]  Robert Fullér,et al.  On Obtaining Minimal Variability Owa Operator Weights , 2002, Fuzzy Sets Syst..

[25]  Jonathan M. Garibaldi,et al.  A quantifier-based fuzzy classification system for breast cancer patients , 2013, Artif. Intell. Medicine.

[26]  Peter K. Sharpe,et al.  Dealing with missing values in neural network-based diagnostic systems , 1995, Neural Computing & Applications.

[27]  Ching-Hsue Cheng,et al.  OWA Based Information Fusion Techniques for Classification Problem , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[28]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[29]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[30]  Harichandran Khanna Nehemiah,et al.  Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network , 2015, Comput. Math. Methods Medicine.

[31]  José Antonio Gómez-Ruiz,et al.  A combined neural network and decision trees model for prognosis of breast cancer relapse , 2003, Artif. Intell. Medicine.

[32]  Rudolf Kruse,et al.  Obtaining interpretable fuzzy classification rules from medical data , 1999, Artif. Intell. Medicine.

[33]  Fredrik A. Dahl,et al.  Convergence of random k-nearest-neighbour imputation , 2007, Comput. Stat. Data Anal..

[34]  Khairul A. Rasmani,et al.  Subsethood-based fuzzy modelling and classification , 2004 .

[35]  Ching-Hsue Cheng,et al.  OWA-weighted based clustering method for classification problem , 2009, Expert Syst. Appl..

[36]  Bradley Efron,et al.  Missing Data, Imputation, and the Bootstrap , 1994 .

[37]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[38]  Mohd Bakri Adam,et al.  Effect of missing value methods on Bayesian network classification of hepatitis data , 2013 .

[39]  Robert Fullér,et al.  An Analytic Approach for Obtaining Maximal Entropy Owa Operator Weights , 2001, Fuzzy Sets Syst..

[40]  Emad A. Mohammed,et al.  Application of Support Vector Machine and k-means clustering algorithms for robust chronic lymphocytic leukemia color cell segmentation , 2013, 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013).

[41]  Peng Liu,et al.  A Quantitative Study of the Effect of Missing Data in Classifiers , 2005, The Fifth International Conference on Computer and Information Technology (CIT'05).

[42]  Ton J. Cleophas,et al.  Missing-data Imputation , 2022 .

[43]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[44]  Koby Crammer,et al.  Robust Forward Algorithms via PAC-Bayes and Laplace Distributions , 2014, AISTATS.

[45]  Takanori Shibata,et al.  Genetic Algorithms And Fuzzy Logic Systems Soft Computing Perspectives , 1997 .

[46]  Oge Marques,et al.  Practical Image and Video Processing Using MATLAB®: Marques/Practical Image Processing , 2011 .

[47]  J. Kacprzyk,et al.  The Ordered Weighted Averaging Operators: Theory and Applications , 1997 .

[48]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[49]  Ram Nivas Giri,et al.  Comparative Analysis of Artificial Neural Network and Support Vector Machine Classification for Breast Cancer Detection , 2015 .

[50]  John H. Holmes,et al.  The Effect of Missing Data on Learning Classifier System Learning Rate and Classification Performance , 2002, IWLCS.

[51]  Paulo J. G. Lisboa,et al.  Clustering breast cancer data by consensus of different validity indices , 2008 .

[52]  B. D. Ripley,et al.  Statistical Data Mining , 2002 .

[53]  Philippe Vieu,et al.  Nonparametric regression estimation for functional stationary ergodic data with missing at random , 2015 .

[54]  Rashmi Agrawal,et al.  A Modified K-Nearest Neighbor Algorithm to Handle Uncertain Data , 2015, 2015 5th International Conference on IT Convergence and Security (ICITCS).

[55]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[56]  Marcelo Zanchetta do Nascimento,et al.  Segmentation and detection of breast cancer in mammograms combining wavelet analysis and genetic algorithm , 2014, Comput. Methods Programs Biomed..

[57]  Evie McCrum-Gardner,et al.  Which is the correct statistical test to use? , 2008, The British journal of oral & maxillofacial surgery.