A quantifier-based fuzzy classification system for breast cancer patients

OBJECTIVES Recent studies of breast cancer data have identified seven distinct clinical phenotypes (groups) using immunohistochemical analysis and a range of different clustering techniques. Consensus between unsupervised classification algorithms has been successfully used to categorise patients into these specific groups, but often at the expenses of not classifying the whole set. It is known that fuzzy methodologies can provide linguistic based classification rules. The objective of this study was to investigate the use of fuzzy methodologies to create an easy to interpret set of classification rules, capable of placing the large majority of patients into one of the specified groups. MATERIALS AND METHODS In this paper, we extend a data-driven fuzzy rule-based system for classification purposes (called 'fuzzy quantification subsethood-based algorithm') and combine it with a novel class assignment procedure. The whole approach is then applied to a well characterised breast cancer dataset consisting of ten protein markers for over 1000 patients to refine previously identified groups and to present clinicians with a linguistic ruleset. A range of statistical approaches was used to compare the obtained classes to previously obtained groupings and to assess the proportion of unclassified patients. RESULTS A rule set was obtained from the algorithm which features one classification rule per class, using labels of High, Low or Omit for each biomarker, to determine the most appropriate class for each patient. When applied to the whole set of patients, the distribution of the obtained classes had an agreement of 0.9 when assessed using Kendall's Tau with the original reference class distribution. In doing so, only 38 patients out of 1073 remain unclassified, representing a more clinically usable class assignment algorithm. CONCLUSION The fuzzy algorithm provides a simple to interpret, linguistic rule set which classifies over 95% of breast cancer patients into one of seven clinical groups.

[1]  Bart Kosko,et al.  Fuzzy entropy and conditioning , 1986, Inf. Sci..

[2]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[3]  Hisao Ishibuchi,et al.  Effect of rule weights in fuzzy rule-based classification systems , 2001, IEEE Trans. Fuzzy Syst..

[4]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[5]  Michael Peacock,et al.  Hierarchical Clustering Analysis of Tissue Microarray Immunostaining Data Identifies Prognostically Significant Groups of Breast Carcinoma , 2004, Clinical Cancer Research.

[6]  Shyi-Ming Chen,et al.  A new method for generating fuzzy rules from numerical data for handling classification problems , 2001, Appl. Artif. Intell..

[7]  Khairul A. Rasmani,et al.  Consensus Clustering And Fuzzy Classification For Breast Cancer Prognosis , 2010, ECMS.

[8]  Humberto Bustince,et al.  Definition and construction of fuzzy DI-subsethood measures , 2006, Inf. Sci..

[9]  Giovanna Castellano,et al.  Classifying data with interpretable fuzzy granulation , 2006 .

[10]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Jonathan M. Garibaldi,et al.  A Comparison of Three Different Methods for Classification of Breast Cancer Data , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[12]  Pei-Chann Chang,et al.  A TSK type fuzzy rule based system for stock price prediction , 2008, Expert Syst. Appl..

[13]  Daniel Birnbaum,et al.  Protein expression profiling identifies subclasses of breast cancer and predicts prognosis. , 2005, Cancer research.

[14]  M. Shaw,et al.  Induction of fuzzy decision trees , 1995 .

[15]  G. Ball,et al.  Identification of key clinical phenotypes of breast cancer using a reduced panel of protein biomarkers , 2013, British Journal of Cancer.

[16]  G. Clark Do we really need prognostic factors for breast cancer? , 2004, Breast Cancer Research and Treatment.

[17]  Elia Biganzoli,et al.  Molecular Subtyping of Breast Cancer from Traditional Tumor Marker Profiles Using Parallel Clustering Methods , 2006, Clinical Cancer Research.

[18]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[19]  John H. Maindonald,et al.  Comprar Data Analysis and Graphics Using R | John Maindonald | 9780521762939 | Cambridge University Press , 2010 .

[20]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[21]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[22]  Khairul A. Rasmani,et al.  Modifying weighted fuzzy subsethood-based rule models with fuzzy quantifiers , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[23]  Xiaohui Liu,et al.  Consensus clustering and functional interpretation of gene-expression data , 2004, Genome Biology.

[24]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[25]  G. Ball,et al.  High‐throughput protein expression analysis using tissue microarray technology of a large well‐characterised series identifies biologically distinct classes of breast cancer confirming recent cDNA expression analyses , 2005, International journal of cancer.

[26]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[27]  Yudong D. He,et al.  Expression profiling predicts outcome in breast cancer , 2002, Breast Cancer Research.

[28]  Khairul A. Rasmani,et al.  Subsethood-based fuzzy modelling and classification , 2004 .

[29]  Hisao Ishibuchi,et al.  Rule weight specification in fuzzy rule-based classification systems , 2005, IEEE Transactions on Fuzzy Systems.

[30]  John Yen,et al.  Simplifying fuzzy rule-based models using orthogonal transformation methods , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[31]  I. Ellis,et al.  Expression and co-expression of the members of the epidermal growth factor receptor (EGFR) family in invasive breast carcinoma , 2004, British Journal of Cancer.

[32]  I. Ellis,et al.  The Nottingham prognostic index in primary breast cancer , 2005, Breast Cancer Research and Treatment.

[33]  Renpu Li,et al.  Mining classification rules using rough sets and neural networks , 2004, Eur. J. Oper. Res..

[34]  Khairul A. Rasmani,et al.  Linguistic rulesets extracted from a quantifier-based fuzzy classification system , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[35]  I. Ellis,et al.  A critical appraisal of existing classification systems of epithelial hyperplasia and in situ neoplasia of the breast with proposals for future methods of categorization: where are we going? , 1999, Seminars in diagnostic pathology.

[36]  R. Tibshirani,et al.  Copyright © American Society for Investigative Pathology Short Communication Expression of Cytokeratins 17 and 5 Identifies a Group of Breast Carcinomas with Poor Clinical Outcome , 2022 .

[37]  Ian Witten,et al.  Data Mining , 2000 .

[38]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Paulo J. G. Lisboa,et al.  Clustering breast cancer data by consensus of different validity indices , 2008 .

[40]  C. Compton,et al.  Cystic lesions of the pancreas. Introduction. , 2000, Seminars in diagnostic pathology.

[41]  Carlos Caldas,et al.  Molecular Classification of Breast Carcinomas Using Tissue Microarrays , 2003, Diagnostic molecular pathology : the American journal of surgical pathology, part B.

[42]  M. J. van de Vijver,et al.  Microarray-Based Determination of Estrogen Receptor, Progesterone Receptor, and HER2 Receptor Status in Breast Cancer , 2009, Clinical Cancer Research.

[43]  Steven Skiena,et al.  Integrating microarray data by consensus clustering , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[44]  Paulo J. G. Lisboa,et al.  A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients , 2010, Comput. Biol. Medicine.

[45]  Allan Tucker,et al.  Comparing, Contrasting and Combining Clusters in Viral Gene Expression , 2001 .

[46]  Juan Liu,et al.  Mixture classification model based on clinical markers for breast cancer prognosis , 2010, Artif. Intell. Medicine.

[47]  Reyer Zwiggelaar,et al.  Fuzzy-rough approaches for mammographic risk analysis , 2010, Intell. Data Anal..

[48]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[49]  R. Blamey,et al.  A prognostic index in primary breast cancer. , 1982, British Journal of Cancer.

[50]  Khairul A. Rasmani,et al.  Weighted linguistic modelling based on fuzzy subsethood values , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[51]  Berton H. Gunter,et al.  Data Analysis and Graphics Using R: An Example-Based Approach , 2004, Technometrics.

[52]  Jerzy W. Grzymala-Busse,et al.  Knowledge acquisition under uncertainty — a rough set approach , 1988, J. Intell. Robotic Syst..

[53]  G. Ball,et al.  Nottingham Prognostic Index Plus (NPI+): a modern clinical decision making tool in breast cancer , 2014, British Journal of Cancer.

[54]  M. Vila,et al.  Using OWA operator in flexible query processing , 1997 .