Robust techniques and applications in fuzzy clustering

ROBUST TECHNIQUES AND APPLICATIONS IN FUZZY CLUSTERING by Amit Banerjee This dissertation addresses issues central to fuzzy classification. The issue of sensitivity to noise and outliers of least squares minimization based clustering techniques, such as Fuzzy c-Means (FCM) and its variants is addressed. In this work, two novel and robust clustering schemes are presented and analyzed in detail. They approach the problem of robustness from different perspectives. The first scheme scales down the FCM memberships of data points based on the distance of the points from the cluster centers. Scaling done on outliers reduces their membership in true clusters. This scheme, known as the Mega-clustering, defines a conceptual mega-cluster which is a collective cluster of all data points but views outliers and good points differently (as opposed to the concept of Dave's Noise cluster). The scheme is presented and validated with experiments and similarities with Noise Clustering (NC) are also presented. The other scheme is based on the feasible solution algorithm that implements the Least Trimmed Squares (LTS) estimator. The LTS estimator is known to be resistant to noise and has a high breakdown point. The feasible solution approach also guarantees convergence of the solution set to a global optima. Experiments show the practicability of the proposed schemes in terms of computational requirements and in the attractiveness of their simplistic frameworks. The issue of validation of clustering results has often received less attention than clustering itself. Fuzzy and non-fuzzy cluster validation schemes are reviewed and a novel methodology for cluster validity using a test for random position hypothesis is developed. The random position hypothesis is tested against an alternative clustered hypothesis on every cluster produced by the partitioning algorithm. The Hopkins statistic is used as a basis to accept or reject the random position hypothesis, which is also the null hypothesis in this case. The Hopkins statistic is known to be a fair estimator of randomness in a data set. The concept is borrowed from the clustering tendency domain and its applicability to validating clusters is shown here. A unique feature selection procedure for use with large molecular conformational datasets with high dimensionality is also developed. The intelligent feature extraction scheme not only helps in reducing dimensionality of the feature space but also helps in eliminating contentious issues such as the ones associated with labeling of symmetric atoms in the molecule. The feature vector is converted to a proximity matrix, and is used as an input to the relational fuzzy clustering (FRC) algorithm with very promising results. Results are also validated using several cluster validity measures from literature. Another application of fuzzy clustering considered here is image segmentation. Image analysis on extremely noisy images is carried out as a precursor to the development of an automated real time condition state monitoring system for underground pipelines. A two-stage FCM with intelligent feature selection is implemented as the segmentation procedure and results on a test image are presented. A conceptual framework for automated condition state assessment is also developed. ROBUST TECHNIQUES AND APPLICATIONS IN FUZZY CLUSTERING

[1]  M. P. Windham Cluster validity for fuzzy clustering algorithms , 1981 .

[2]  Erkki Oja,et al.  A new curve detection method: Randomized Hough transform (RHT) , 1990, Pattern Recognit. Lett..

[3]  J. G. Skellam,et al.  A New Method for determining the Type of Distribution of Plant Individuals , 1954 .

[4]  Paul W. Fieguth,et al.  Automated analysis and detection of cracks in underground scanned pipes , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[5]  Alfio Marazzi Algorithms and programs for robust linear regression , 1991 .

[6]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[7]  T. Cox,et al.  A conditioned distance ratio method for analyzing spatial patterns , 1976 .

[8]  Paul R. Kersten,et al.  Fuzzy order statistics and their application to fuzzy clustering , 1999, IEEE Trans. Fuzzy Syst..

[9]  Paul Fieguth,et al.  Computer Vision Techniques for Automatic Structural Assessment of Underground Pipes , 2003 .

[10]  Michael P. Windham,et al.  Cluster Validity for the Fuzzy c-Means Clustering Algorithrm , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Lotfi A. Zadeh,et al.  Soft computing and fuzzy logic , 1994, IEEE Software.

[12]  T. W. Ridler,et al.  Picture thresholding using an iterative selection method. , 1978 .

[13]  Moshe Kam,et al.  A noise-resistant fuzzy c means algorithm for clustering , 1998, 1998 IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228).

[14]  L. Eberhardt Some developments in 'distance sampling'. , 1967, Biometrics.

[15]  Osama Moselhi,et al.  Automated detection of surface defects in water and sewer pipes , 1999 .

[16]  Peter S. Shenkin,et al.  Cluster analysis of molecular conformations , 1994, J. Comput. Chem..

[17]  Miklos Feher,et al.  Metric and Multidimensional Scaling: Efficient Tools for Clustering Molecular Conformations , 2001, J. Chem. Inf. Comput. Sci..

[18]  Donald E. Brown,et al.  Fast generic selection of features for neural network classifiers , 1992, IEEE Trans. Neural Networks.

[19]  Rajesh N. Davé,et al.  Validating fuzzy partitions obtained through c-shells clustering , 1996, Pattern Recognit. Lett..

[20]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[21]  Hans-Jürgen Zimmermann,et al.  Media Selection and Fuzzy Linear Programming , 1978 .

[22]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[23]  A. Siegel Robust regression using repeated medians , 1982 .

[24]  Michael H. Baumann,et al.  Development of Neurochemical Normalization (“Agonist Substitution”) Therapeutics for Stimulant Abuse: Focus on the Dopamine Uptake Inhibitor, GBR12909 , 2004 .

[25]  Amit Banerjee,et al.  Novel Feature Extraction Technique for Fuzzy Relational Clustering of a Flexible Dopamine Reuptake Inhibitor , 2005, J. Chem. Inf. Model..

[26]  A. Keller Fuzzy clustering with outliers , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[27]  Lee Luan Ling,et al.  Feature extraction based on fuzzy set theory for handwriting recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[28]  James C. Bezdek,et al.  Image Processing in Medicine , 1999 .

[29]  J. Bezdek,et al.  VAT: a tool for visual assessment of (cluster) tendency , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[30]  P. Holgate,et al.  Some New Tests of Randomness , 1965 .

[31]  C. Kyriakakis,et al.  A cluster centroid method for room response equalization at multiple locations , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[32]  L. Hubert,et al.  Quadratic assignment as a general data analysis strategy. , 1976 .

[33]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[34]  Paul Fieguth,et al.  Underground pipe cracks classification using image analysis and neuro-fuzzy algorithm , 1999, Proceedings of the 1999 IEEE International Symposium on Intelligent Control Intelligent Systems and Semiotics (Cat. No.99CH37014).

[35]  M. Roubens Pattern classification problems and fuzzy sets , 1978 .

[36]  D. Dubois,et al.  Fuzzy real algebra: Some results , 1979 .

[37]  Richard Bellman,et al.  Decision-making in fuzzy environment , 2012 .

[38]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[39]  Peter Murray-Rust,et al.  Computer analysis of molecular geometry: Part VI: Classification of differences in conformation , 1985 .

[40]  Douglas M. Hawkins,et al.  The feasible solution algorithm for least trimmed squares regression , 1994 .

[41]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[42]  William Stanford,et al.  The fuzzy Hough transform-feature extraction in medical images , 1994, IEEE Trans. Medical Imaging.

[43]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Enrique H. Ruspini,et al.  A New Approach to Clustering , 1969, Inf. Control..

[45]  J. A. Goguen,et al.  The logic of inexact concepts , 1969, Synthese.

[46]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[47]  Lotfi A. Zadeh,et al.  The Concepts of a Linguistic Variable and its Application to Approximate Reasoning , 1975 .

[48]  Jongwoo Kim,et al.  Application of the least trimmed squares technique to prototype-based clustering , 1996, Pattern Recognit. Lett..

[49]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[50]  Paul W. Baim A Method for Attribute Selection in Inductive Learning Systems , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Wei Li-mei Rival Checked Fuzzy C-Means Algorithm , 2000 .

[52]  Isaac Weiss,et al.  Projective invariants of shapes , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Azriel Rosenfeld,et al.  Cluster detection in background noise , 1989, Pattern Recognit..

[54]  Didier Dubois,et al.  Fuzzy sets and systems ' . Theory and applications , 2007 .

[55]  Y. Fukuyama,et al.  A new method of choosing the number of clusters for the fuzzy c-mean method , 1989 .

[56]  Osama Moselhi,et al.  On the use of fuzzy clustering in construction simulation , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).

[57]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[58]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[59]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[60]  Jacek M. Leski,et al.  Towards a robust fuzzy clustering , 2003, Fuzzy Sets Syst..

[61]  G. L. S. Shackle,et al.  Decision Order and Time in Human Affairs , 1962 .

[62]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[63]  G. Cockerham,et al.  Automatic task modelling for sewer studies , 1996 .

[64]  Jin-Jang Leou,et al.  New Fuzzy Hierarchical Clustering Algorithms , 1993, J. Inf. Sci. Eng..

[65]  R. Mead,et al.  A test for spatial pattern at several scales using data from a grid of contiguous quadrats. , 1974 .

[66]  James C. Bezdek The thirsty traveler visits Gamont: a rejoinder to "Comments on fuzzy sets-what are they and why?" , 1994, IEEE Trans. Fuzzy Syst..

[67]  B. Ripley Modelling Spatial Patterns , 1977 .

[68]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[69]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[70]  Rajesh N. Davé,et al.  Robust fuzzy clustering of relational data , 2002, IEEE Trans. Fuzzy Syst..

[71]  J. Baldwin A new approach to approximate reasoning using a fuzzy logic , 1979 .

[72]  R. Gunderson Application of Fuzzy Isodata Algorithms to Star Tracker Pointing Systems , 1978 .

[73]  Anil K. Jain,et al.  Feature definition in pattern recognition with small sample size , 1978, Pattern Recognit..

[74]  James C. Bezdek,et al.  A mixed c-means clustering model , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[75]  E. H. Mamdani,et al.  An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Man Mach. Stud..

[76]  Sankar K. Pal,et al.  A review on image segmentation techniques , 1993, Pattern Recognit..

[77]  Joonwhoan Lee,et al.  Fuzzy-set-based hierarchical networks for information fusion in computer vision , 1992, Neural Networks.

[78]  Enrique H. Ruspini New experimental results in fuzzy clustering , 1973, Inf. Sci..

[79]  Miklos Feher,et al.  Fuzzy Clustering as a Means of Selecting Representative Conformers and Molecular Alignments , 2003, J. Chem. Inf. Comput. Sci..

[80]  Sunil K. Sinha,et al.  Intelligent System for Condition Monitoring of Underground Pipelines , 2004 .

[81]  James C. Bezdek,et al.  Visual cluster validity for prototype generator clustering models , 2003, Pattern Recognit. Lett..

[82]  Raghu Krishnapuram,et al.  Fuzzy and possibilistic clustering methods for computer vision , 1994, Other Conferences.

[83]  Analysis of molecular conformations using relative planes , 2004 .

[84]  Xiaomin Liu,et al.  A Least Biased Fuzzy Clustering Method , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[85]  Nozha Boujemaa,et al.  Fuzzy iterative image segmentation with recursive merging , 1992, Other Conferences.

[86]  Weixin Xie,et al.  Suppressed fuzzy c-means clustering algorithm , 2003, Pattern Recognit. Lett..

[87]  P. Holgate Tests of randomness based on distance methods , 1965 .

[88]  Erdal Panayirci,et al.  A test for multidimensional clustering tendency , 1983, Pattern Recognit..

[89]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[90]  Fuzzy Models-What Are They , and Why ? , 2004 .

[91]  Андрей Николаевич Колмогоров,et al.  Теория вероятностей@@@The theory of probability , 2003 .

[92]  Fakhri Karray,et al.  Classification of underground pipe scanned images using feature extraction and neuro-fuzzy algorithm , 2002, IEEE Trans. Neural Networks.

[93]  J. Bezdek Numerical taxonomy with fuzzy sets , 1974 .

[94]  Joseph Naus,et al.  Approximations for Distributions of Scan Statistics , 1982 .

[95]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[96]  Amiram Goldblum,et al.  The "Nearest Single Neighbor" Method-Finding Families of Conformations within a Sample , 2003, J. Chem. Inf. Comput. Sci..