Detecting and Verifying Dissimilar Patterns in Unlabelled Data

Clustering of unlabelled data is a difficult problem with numerous applications in various fields. When input space dimensions are many, the number of distinct patterns in the data is not known a priori, and feature scales are different, then the problem becomes much harder. In this paper we deal with such a problem. Our approach is based on an extension to hierarchical clustering that makes it suitable for data sets with numerous independent features. The results of this initial clustering are refined via a reclassification step. The issue of evaluation of hierarchical clustering methods is also discussed. The performance of the proposed methodology is demonstrated through the application to a synthetic data set and verified through application to a variety of well known machine learning data sets.

[1]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[2]  Jonny Eriksson,et al.  Feature reduction for classification of multidimensional data , 2000, Pattern Recognit..

[3]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[4]  Philip S. Yu,et al.  Redefining Clustering for High-Dimensional Applications , 2002, IEEE Trans. Knowl. Data Eng..

[5]  Ravi Kothari,et al.  Feature subset selection using a new definition of classifiability , 2003, Pattern Recognit. Lett..

[6]  W. Pedrycz,et al.  Fuzzy computing for data mining , 1999, Proc. IEEE.

[7]  Ronald R. Yager Intelligent control of the hierarchical agglomerative clustering process , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[8]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[9]  Giorgos B. Stamou,et al.  Towards a context aware mining of user interests for consumption of multimedia documents , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[10]  Kuhu Pal,et al.  Breast cancer detection using rank nearest neighbor classification rules , 2003, Pattern Recognit..

[11]  Sadaaki Miyamoto,et al.  Fuzzy Sets in Information Retrieval and Cluster Analysis , 1990, Theory and Decision Library.

[12]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[13]  Nicolas Tsapatsoulis,et al.  Improving the Performance of Resource Allocation Networks through Hierarchical Clustering of High-Dimensional Data , 2003, ICANN.

[14]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation (3rd Edition) , 2007 .

[15]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[16]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.