Fuzzyfication of principle component analysis for data dimensionalty reduction

Principal component analysis (PCA) extracts small uncorrelated data from original high dimensional data space and is widely used for data analysis. The methodology of classical PCA is based on orthogonal projection defined in convex vector space. Thus, a norm between two projected vectors is unavoidably smaller than the norm between any two objects before implementation of PCA. Due to this, in some cases when the PCA cannot capture the data structure, its implementation does not necessarily confirm the real similarity of data in the higher dimensional space, making the results unacceptable. In order to avoid this problem for the purposes of high dimensional data clustering, we propose new Fuzzy PCA (FPCA) algorithm. The novelty is in the extracted similarity structures of objects in high-dimensional space and dissimilarities between objects based on their cluster structure and dimension when the algorithm is implemented in conjunction with known fuzzy clustering algorithms as FCM, GK or GG algorithms. This is done by using the fuzzy membership functions and modification of the classical PCA approach by considering the similarity structures during the construction of projections in smaller dimensional space. The effectiveness of the proposed algorithm is tested on several benchmark data sets. We also evaluate the clustering efficiency by implementing validation measures.

[1]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[2]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[3]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Lakhmi C. Jain,et al.  Self-Organized Fuzzy Clustering , 2006 .

[5]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Uzay Kaymak,et al.  Improved covariance estimation for Gustafson-Kessel clustering , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[7]  Maurizio Filippone,et al.  Dealing with non-metric dissimilarities in fuzzy central clustering algorithms , 2009, Int. J. Approx. Reason..

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[10]  Fernando Vilariño,et al.  ROC curves and video analysis optimization in intestinal capsule endoscopy , 2006, Pattern Recognit. Lett..

[11]  Philip S. Yu,et al.  Redefining Clustering for High-Dimensional Applications , 2002, IEEE Trans. Knowl. Data Eng..

[12]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[13]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Bezdek,et al.  DETECTION AND CHARACTERIZATION OF CLUSTER SUBSTRUCTURE I. LINEAR STRUCTURE: FUZZY c-LINES* , 1981 .

[15]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[16]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[17]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[18]  J. Bezdek,et al.  Detection and Characterization of Cluster Substructure II. Fuzzy c-Varieties and Convex Combinations Thereof , 1981 .