Granular Fuzzy Possibilistic C-Means Clustering approach to DNA microarray problem

Abstract Deoxyribonucleic acid (DNA) microarray is an important technology, which supports a simultaneous measurement of thousands of genes for biological analysis. With the rapid development of the gene expression data characterized by uncertainty and being of high dimensionality, there is a genuine need for advanced processing techniques. With this regard, Fuzzy Possibilistic C-Means Clustering (FPCM) and Granular Computing (GrC) are introduced with the aim to solve problems of feature selection and outlier detection. In this study, by taking advantage of the FPCM and GrC, an Advanced Fuzzy Possibilistic C-Means Clustering based on Granular Computing (GrFPCM) is proposed to select features as a preprocessing phase for clustering problems while the developed granular space is used to cope with uncertainty. Experiments were completed for various gene expression datasets and a comparative analysis is reported.

[1]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Chien-Hsing Chen Comparing batch update with randomized update for identifying salient genes applied to cancer gene expression clustering , 2014, J. Inf. Sci..

[3]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[5]  Runxin He,et al.  The Reduction of Facial Feature Based on Granular Computing , 2011 .

[6]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[7]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[8]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[10]  B. Kalaavathi,et al.  A microarray gene expression data classification using hybrid back propagation neural network , 2014 .

[11]  Lin Sun,et al.  Granular Space-Based Feature Selection and Its Applications , 2013, J. Softw..

[12]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[13]  James C. Bezdek,et al.  A mixed c-means clustering model , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[14]  Jiye Liang,et al.  Fuzzy Granular Structure Distance , 2015, IEEE Transactions on Fuzzy Systems.

[15]  Yi Zhang,et al.  Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. , 2006, The Journal of molecular diagnostics : JMD.

[16]  Huihui Chen,et al.  A kernel-based clustering method for gene selection with gene expression data , 2016, J. Biomed. Informatics.

[17]  Ujjwal Maulik,et al.  Towards improving fuzzy clustering using support vector machine: Application to gene expression data , 2009, Pattern Recognit..

[18]  D. Botstein,et al.  Gene expression patterns in human liver cancers. , 2002, Molecular biology of the cell.

[19]  Mohd Saberi Mohamad,et al.  BIOLOGICAL ANALYSIS OF MICROARRAY DATA USING ORTHOGONAL FORWARD SELECTION WITH A CLUSTERING APPROACH , 2015 .

[20]  Jiang-She Zhang,et al.  Improved possibilistic C-means clustering algorithms , 2004, IEEE Trans. Fuzzy Syst..

[21]  Lin Sun,et al.  Feature selection using rough entropy-based uncertainty measures in incomplete decision systems , 2012, Knowl. Based Syst..

[22]  John T. Wei,et al.  Integrative molecular concept modeling of prostate cancer progression , 2007, Nature Genetics.

[23]  Paolo Giordani,et al.  Possibilistic and fuzzy clustering methods for robust analysis of non-precise data , 2017, Int. J. Approx. Reason..

[24]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Lotfi A. Zadeh,et al.  Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997, Fuzzy Sets Syst..

[26]  Li Shen,et al.  Dimension reduction-based penalized logistic regression for cancer classification using microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Witold Pedrycz,et al.  From fuzzy data analysis and fuzzy regression to granular fuzzy data analysis , 2015, Fuzzy Sets Syst..

[28]  Han Zhao,et al.  Research on the hybrid models of granular computing and support vector machine , 2013, Artificial Intelligence Review.

[29]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[30]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[31]  Witold Pedrycz,et al.  Granular computing: an introduction , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[32]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[33]  Wei Chen,et al.  Gene expression data analysis with the clustering method based on an improved quantum-behaved Particle Swarm Optimization , 2012, Eng. Appl. Artif. Intell..

[34]  Caihui Liu,et al.  Hierarchical attribute reduction algorithms for big data using MapReduce , 2015, Knowl. Based Syst..

[35]  Qinghua Hu,et al.  Mixed feature selection based on granulation and approximation , 2008, Knowl. Based Syst..

[36]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[37]  Torben F. Ørntoft,et al.  Identifying distinct classes of bladder carcinoma using microarrays , 2003, Nature Genetics.

[38]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[39]  T N Wang,et al.  An improved K-means clustering method for cDNA microarray image segmentation. , 2015, Genetics and molecular research : GMR.

[40]  Sung-Bae Cho,et al.  Meta-classifiers for high-dimensional, small sample classification for gene expression analysis , 2015, Pattern Analysis and Applications.

[41]  L. Aaltonen,et al.  Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis , 2007, Oncogene.

[42]  David R. C. Hill,et al.  High performance computing of oligopeptides complete backtranslation applied to DNA microarray probe design , 2016, Concurr. Comput. Pract. Exp..

[43]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.