SFCM: A Fuzzy Clustering Algorithm of Extracting the Shape Information of Data

Topological data analysis is a new theoretical trend using topological techniques to mine data. This approach helps determine topological data structures. It focuses on investigating the global shape of data rather than on local information of high-dimensional data. The Mapper algorithm is considered as a sound representative approach in this area. It is used to cluster and identify concise and meaningful global topological data structures that are out of reach for many other clustering methods. In this article, we propose a new method called the Shape Fuzzy C-Means (SFCM) algorithm, which is constructed based on the Fuzzy C-Means algorithm with particular features of the Mapper algorithm. The SFCM algorithm can not only exhibit the same clustering ability as the Fuzzy C-Means but also reveal some relationships through visualizing the global shape of data supplied by the Mapper. We present a formal proof and include experiments to confirm our claims. The performance of the enhanced algorithm is demonstrated through a comparative analysis involving the original algorithm, Mapper, and the other fuzzy set based improved algorithm, F-Mapper, for synthetic and real-world data. The comparison is conducted with respect to output visualization in the topological sense and clustering stability.

[1]  R. Ghrist Barcodes: The persistent topology of data , 2007 .

[2]  Seok-Beom Roh,et al.  Design of Fuzzy Ensemble Architecture Realized With the Aid of FCM-Based Fuzzy Partition and NN With Weighted LSE Estimation , 2019, IEEE Transactions on Fuzzy Systems.

[3]  Mahesan Niranjan,et al.  A numerical measure of the instability of Mapper-type algorithms , 2019, J. Mach. Learn. Res..

[4]  Pawel Dlotko,et al.  Financial ratios and stock returns reappraised through a topological data analysis lens , 2019, The European Journal of Finance.

[5]  Witold Pedrycz,et al.  Hyperplane Division in Fuzzy C-Means: Clustering Big Data , 2020, IEEE Transactions on Fuzzy Systems.

[6]  Minkyu Kim,et al.  Extracting Knowledge from the Geometric Shape of Social Network Data Using Topological Data Analysis , 2017, Entropy.

[7]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[8]  Anantharaman Kalyanaraman,et al.  Interesting Paths in the Mapper , 2017, ArXiv.

[9]  Mustafa Hajij,et al.  MOG: Mapper on Graphs for Relationship Preserving Clustering , 2018, ArXiv.

[10]  Facundo Mémoli,et al.  Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition , 2007, PBG@Eurographics.

[11]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[12]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[13]  Ronald R. Yager,et al.  Another View on Generalized Intuitionistic Fuzzy Soft Sets and Related Multiattribute Decision Making Methods , 2019, IEEE Transactions on Fuzzy Systems.

[14]  P. Y. Lum,et al.  Extracting insights from the shape of complex data using topology , 2013, Scientific Reports.

[15]  Steve Oudot,et al.  Statistical Analysis and Parameter Selection for Mapper , 2017, J. Mach. Learn. Res..

[16]  Valerio Persico,et al.  Big Data for Health , 2019, Encyclopedia of Big Data Technologies.

[17]  Witold Pedrycz,et al.  A Development of Fuzzy Encoding and Decoding Through Fuzzy Clustering , 2008, IEEE Transactions on Instrumentation and Measurement.

[18]  Steve Oudot,et al.  Persistence Theory - From Quiver Representations to Data Analysis , 2015, Mathematical surveys and monographs.

[19]  Sharmila Majumdar,et al.  Using multidimensional topological data analysis to identify traits of hip osteoarthritis , 2018, Journal of magnetic resonance imaging : JMRI.

[20]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[21]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[22]  David Eargle,et al.  Kepler Mapper: A flexible Python implementation of the Mapper algorithm , 2019, J. Open Source Softw..

[23]  Frédéric Chazal,et al.  An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists , 2017, Frontiers in Artificial Intelligence.

[24]  Pawel Dlotko,et al.  Ball mapper: a shape summary for topological data analysis , 2019, 1901.07410.

[25]  Gunnar E. Carlsson,et al.  Topological pattern recognition for point cloud data* , 2014, Acta Numerica.

[26]  Philippe Fournier-Viger,et al.  Fast algorithms for mining multiple fuzzy frequent itemsets , 2016, 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[27]  S. Rudkin,et al.  An economic topology of the Brexit vote , 2019, Regional Studies.

[28]  Tzung-Pei Hong,et al.  A fast Algorithm for mining fuzzy frequent itemsets , 2015, J. Intell. Fuzzy Syst..

[29]  Moo K. Chung,et al.  Topological Data Analysis , 2012 .

[30]  Ricardo J. G. B. Campello,et al.  Fuzzy Clustering Algorithms and Validity Indices for Distributed Data , 2015 .

[31]  Steve Oudot,et al.  Two-Tier Mapper: a user-independent clustering method for global gene expression analysis based on topology , 2017, 1801.01841.

[32]  Steve Oudot,et al.  Structure and Stability of the One-Dimensional Mapper , 2015, Found. Comput. Math..

[33]  Paul Rosen,et al.  Distributed Mapper , 2017, ArXiv.

[34]  Fanyong Meng,et al.  Linguistic intuitionistic fuzzy preference relations and their application to multi-criteria decision making , 2019, Inf. Fusion.

[35]  J. Amankwah‐Amoah,et al.  A multidisciplinary perspective of big data in management research , 2017 .

[36]  Yi Lu,et al.  FM-test: a fuzzy-set-theory-based approach to differential gene expression data analysis , 2006, BMC Bioinform..

[37]  Nicole Shelby,et al.  The Future of Machine Intelligence , 2016 .

[38]  Sushmita Mitra,et al.  Web mining: a survey in the fuzzy framework , 2004, Fuzzy Sets Syst..

[39]  Tzung-Pei Hong,et al.  Temporal-Based Fuzzy Utility Mining , 2017, IEEE Access.

[40]  Tamal K. Dey,et al.  Topological Analysis of Nerves, Reeb Spaces, Mappers, and Multiscale Mappers , 2017, SoCG.

[41]  Olaf Sporns,et al.  Towards a new approach to reveal dynamical organization of the brain using topological data analysis , 2018, Nature Communications.

[42]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[43]  M. D. Cock,et al.  Modelling linguistic expressions using fuzzy relations. , 2000 .

[44]  Tzung-Pei Hong,et al.  Using Multi-Conditional Minimum Thresholds in Temporal Fuzzy Utility Mining , 2019, Int. J. Comput. Intell. Syst..

[45]  Olaf Sporns,et al.  Generating dynamical neuroimaging spatiotemporal representations (DyNeuSR) using topological data analysis , 2019, Network Neuroscience.

[46]  G. Reaven,et al.  An attempt to define the nature of chemical diabetes using a multidimensional analysis , 2004, Diabetologia.

[47]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[48]  Adam R Ferguson,et al.  Topological data analysis for discovery in preclinical spinal cord injury and traumatic brain injury , 2015, Nature Communications.

[49]  Jian Yu,et al.  Analysis of the weighting exponent in the FCM , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[50]  Minkyu Kim,et al.  Mining Social Media Data Using Topological Data Analysis , 2017, 2017 IEEE International Conference on Information Reuse and Integration (IRI).

[51]  G. Carlsson,et al.  Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival , 2011, Proceedings of the National Academy of Sciences.

[52]  Navarun Gupta,et al.  Seven V's of Big Data understanding Big Data to extract value , 2014, Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education.

[53]  Tamal K. Dey,et al.  Multiscale Mapper: Topological Summarization via Codomain Covers , 2016, SODA.

[54]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[55]  L. Guibas,et al.  Topological methods for exploring low-density states in biomolecular folding pathways. , 2008, The Journal of chemical physics.

[56]  Bei Wang,et al.  Probabilistic convergence and stability of random mapper graphs , 2019, J. Appl. Comput. Topol..

[57]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[58]  Tongxin Wang,et al.  Topological Methods for Visualization and Analysis of High Dimensional Single-Cell RNA Sequencing Data , 2018, PSB.

[59]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the silhouette width criterion for cluster analysis , 2006, Fuzzy Sets Syst..

[60]  Angkoon Phinyomark,et al.  Resting-State fMRI Functional Connectivity: Big Data Preprocessing Pipelines and Topological Data Analysis , 2017, IEEE Transactions on Big Data.

[61]  Fatos Xhafa,et al.  Geometrical and topological approaches to Big Data , 2017, Future Gener. Comput. Syst..

[62]  Hendrik Jacob van Veen,et al.  MLWave/kepler-mapper: 186f , 2017 .

[63]  Bay Vo,et al.  F-Mapper: A Fuzzy Mapper clustering algorithm , 2020, Knowl. Based Syst..

[64]  Ulrike von Luxburg,et al.  Clustering Stability: An Overview , 2010, Found. Trends Mach. Learn..

[65]  Mason A. Porter,et al.  A roadmap for the computation of persistent homology , 2015, EPJ Data Science.

[66]  Jacek Cyranka,et al.  Mapper Based Classifier , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[67]  Tzung-Pei Hong,et al.  A survey of fuzzy web mining , 2013, WIREs Data Mining Knowl. Discov..

[68]  Christian S. Jensen,et al.  EcoMark: evaluating models of vehicular environmental impact , 2012, SIGSPATIAL/GIS.

[69]  Bertrand Michel,et al.  Approximation of Reeb spaces with Mappers and Applications to Stochastic Filters , 2019, ArXiv.

[70]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .