An Efficient Text Clustering Framework

amount of data for analysis is increasing at a dramatic rate, for example web data. And so, it's important to improve techniques of searching relevant information from the huge data so as to increase efficiency. One such technique is text clustering, whereby we group (or cluster) text documents into various groups (or clusters), such as clustering web search engine results into meaningful groups. Data mining is a computer science area that can be defined as extraction of useful information from large structured data. Text mining on the other hand is an extension of data mining dealing only with (unstructured) text data. Text clustering is thus a text mining technique. In this paper, we give an insight of text clustering including the text mining related areas, techniques, and application areas. We also propose a framework for doing text clustering based on the K Means algorithm. The paper thus gives guidance to researchers of text mining concerning the state of art of text clustering.

[1]  K. Sree,et al.  CLUSTERING BASED ON COSINE SIMILARITY MEASURE , 2012 .

[2]  Ramandeep Kaur,et al.  A Survey of Clustering Techniques , 2010 .

[3]  M. Phil,et al.  COMPARISON OF PARTITION BASED CLUSTERING ALGORITHMS , 2008 .

[4]  Steffen Staab,et al.  Ontology-based Text Document Clustering , 2002, Künstliche Intell..

[5]  Marco Furini,et al.  International Journal of Computer and Applications , 2010 .

[6]  ZhanGang Hao A New Text Clustering Method Based on KSEP , 2012, J. Softw..

[7]  Yanjun Li,et al.  High performance text document clustering , 2007 .

[8]  Wei-Ying Ma,et al.  An Evaluation on Feature Selection for Text Clustering , 2003, ICML.

[9]  Raymond J. Mooney,et al.  Text mining with information extraction , 2004 .

[10]  P. Satheesh,et al.  Comparative Study of K-means and Bisecting k-means Techniques in Wordnet Based Document Clustering , 2012 .

[11]  Vishal Gupta,et al.  Recent Developments in Text Clustering Techniques , 2012 .

[12]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[13]  Dennis McLeod,et al.  Ontology-based information selection , 2000 .

[14]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[15]  Magnus Rosell Text Clustering Exploration : Swedish Text Representation and Clustering Results Unraveled , 2009 .

[16]  Frank S. C. Tseng,et al.  Mining fuzzy frequent itemsets for hierarchical document clustering , 2010, Inf. Process. Manag..

[17]  Malcolm J. Bowman,et al.  Proceedings of the Workshop , 1978 .

[18]  Ioan Alfred Letia,et al.  Self-organizing Maps in Web Mining and Semantic Web , 2010 .

[19]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[20]  Seraj D. Katebi,et al.  An Improved Fuzzy Feature Clustering and Selection based on Chi-Squared-Test , 2022 .

[21]  G. Bharathi,et al.  Study of Ontology or Thesaurus Based Document Clustering and Information Retrieval , 2012 .

[22]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[23]  James C. Wetherbe,et al.  An Empirical Comparison of Four Text Mining Methods , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[24]  Radim Řehůřek Scalability of Semantic Analysis in Natural Language Processing , 2011 .

[25]  Dan I. Moldovan,et al.  Word sense disambiguation of WordNet glosses , 2004, Comput. Speech Lang..

[26]  M. Punithavalli,et al.  A COMPARATIVE STUDY TO FIND A SUITABLE METHOD FOR TEXT DOCUMENT CLUSTERING , 2011 .

[27]  Davide Magatti,et al.  Graphical models for text mining: knowledge extraction and performance estimation , 2011 .

[28]  Jeng-Shyang Pan,et al.  Improved Search Strategies and Extensions to K-medoids-based Algorithms-Extended Report , 2002 .

[29]  Rawatee Maharaj-Sharma Online Lecture Notes , 2005 .

[30]  Edward A. Fox,et al.  Recent Developments in Document Clustering , 2007 .

[31]  ZhanGang Hao,et al.  A New Text Clustering Method Based on KGA , 2012, J. Softw..

[32]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[33]  Soon Myoung Chung,et al.  Text Clustering with Feature Selection by Using Statistical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[34]  Wei Ning,et al.  Textmining and Organization in Large Corpus , 2006 .

[36]  Suprayogi Data Mining:Clustering , 2015 .