A novel similarity measure technique for clustering using multiple viewpoint based method

Data mining is nothing but the process of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. So it is observed that while doing clustering there may be a chance of occurring dissimilar data object in a cluster. This paper introduces such technology that makes the patterns more accurate, and it helps to search more accurate analysis of data. This System greedily picks the next frequent item set in the next cluster. For this the multiple viewpoints are used to measure the similarity between two different data objects is introduced. We can define similarity between two objects explicitly or implicitly. Cosine similarity measures will resolve this problem. As multiple viewpoints will focuses on similarity measures at multiple levels. These criteria will be used to group the documents based on similarity. The similarity measured between current cluster documents and also other cluster group documents.

[1]  Lipika Dey,et al.  A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set , 2007, Pattern Recognit. Lett..

[2]  Elio Masciari,et al.  Fast detection of XML structural similarity , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ruggero G. Pensa,et al.  Context-Based Distance Learning for Categorical Data Clustering , 2009, IDA.

[4]  Susan Gauch,et al.  Document similarity based on concept tree distance , 2008, Hypertext.

[5]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[6]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[7]  Shi Zhong,et al.  Efficient online spherical k-means clustering , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[8]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[9]  Shoban Babu Sriramoju Multi View Point Measure for AchievingHighest Intra-Cluster Similarity , 2014 .

[10]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[11]  Isabelle Guyon,et al.  Clustering: Science or Art? , 2009, ICML Unsupervised and Transfer Learning.

[12]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[13]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[15]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[16]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[17]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.