Query based Text Document Clustering using its Hypernymy Relation

of text can be organized in an unsupervised manner. In this paper, Text document clustering is done based on query and its semantic relation. The method utilizes hypernymy to identify its relation. It was detected by using the Word Net. It act as background knowledge of the Query and provides its synonymic terms. This paper proposed the new term-document matrix called Query based document vector model, which is constructed using query with two terms and its hypernymy. The results show that our new measure Cluster Accuracy is significantly better to evaluate the quality of cluster and better results are obtained. General Terms Text mining and Information Retrieval, Partitioning M ethod KeywordsNoun, Word net, Query based document vector model, Hypernymy, Accuracy.

[1]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[2]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[3]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[4]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[5]  Masoud Makrehchi Query-relevant document representation for text clustering , 2010, 2010 Fifth International Conference on Digital Information Management (ICDIM).

[6]  James E. Gentle,et al.  Finding Groups in Data: An Introduction to Cluster Analysis. , 1991 .

[7]  Ismail Sengör Altingövde,et al.  Efficiency and effectiveness of query processing in cluster-based retrieval , 2004, Inf. Syst..

[8]  D. A. Meedeniya,et al.  Evaluation of Partition-Based Text Clustering Techniques to Categorize Indic Language Documents , 2009, 2009 IEEE International Advance Computing Conference.

[9]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[10]  Alex S. Taylor,et al.  Machine intelligence , 2009, CHI.

[11]  Soon Myoung Chung,et al.  Text document clustering based on frequent word meaning sequences , 2008, Data Knowl. Eng..

[12]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Hong-Gee Kim,et al.  Exploiting noun phrases and semantic relationships for text document clustering , 2009, Inf. Sci..

[14]  David B. Hitchcock,et al.  James-Stein shrinkage to improve k-means cluster analysis , 2010, Comput. Stat. Data Anal..