Agent for Documents Clustering using Semantic-based Model and Fuzzy

clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large sets of documents into a small number of meaningful clusters. Many fuzzy clustering algorithms, such as K-means, deal with documents as bag of words. The bag of words representation method used for these clustering is often unsatisfactory because it ignores the semantic of words. The proposed agent exploits WordNet ontology to create low dimensional feature vector which allows us to develop an efficient clustering algorithm. A new semantic-based model, that represents documents based on semantic concepts of words, is proposed. The proposed approach aims at increasing the performance of information retrieval process by enhancing the document clustering. The accuracy and the speed of clustering have been examined before and after combining ontology with Vector Space Model (VSM). Experimental results demonstrate that using semantic-based model and fuzzy clustering enhances the clustering quality of sets of documents.

[1]  Pierre F. Tiako,et al.  Software Applications: Concepts, Methodologies, Tools, and Applications , 2009 .

[2]  Anastasios A. Economides,et al.  Software Agent Technology: An Overview , 2008 .

[3]  Jianguo Ding,et al.  An efficient semantic VSM based email categorization method , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[4]  P. Viswanth Some Efficient and Fast Approaches to Document Clustering , 2009 .

[5]  Технология Springer Science+Business Media , 2013 .

[6]  Stefano Lodi Data clustering I , 2009 .

[7]  Mostafa M. Aref,et al.  Fuzzy Document Clustering Approach using WordNet Lexical Categories , 2008, SCSS.

[8]  Célia da Costa Pereira,et al.  An Ontology-Based Method for User Model Acquisition , 2006 .

[9]  Abdelmalek Amine,et al.  Evaluation of text clustering methods using wordnet , 2010, Int. Arab J. Inf. Technol..

[10]  Michalis Vazirgiannis,et al.  A Review of Web Document Clustering Approaches , 2010, Data Mining and Knowledge Discovery Handbook.

[11]  Tarek F. Gharib,et al.  Self Organizing Map -based Document Clustering Using WordNet Ontologies , 2012 .

[12]  Lior Rokach,et al.  A survey of Clustering Algorithms , 2010, Data Mining and Knowledge Discovery Handbook.

[13]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[14]  Samah Jamal Fodeh,et al.  On ontology-driven document clustering using core semantic features , 2011, Knowledge and Information Systems.

[15]  Michael Healy,et al.  Theory and Applications of Ontology: Computer Applications , 2010 .

[16]  M. Punithavalli,et al.  Performance Evaluation of Semantic Based and Ontology Based Text Document Clustering Techniques , 2012 .

[17]  Seng Wai Loke,et al.  The Impact of Ontology on the Performance of Information Retrieval: A Case of Wordnet , 2008, Int. J. Inf. Technol. Web Eng..

[18]  Rainer Koschke,et al.  A framework for experimental evaluation of clustering techniques , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[19]  Mohand Boughanem,et al.  Using WordNet for Concept-Based Document Indexing in Information Retrieval , 2010 .

[20]  N. Nagaveni,et al.  An Ontology Based Model for Document Clustering , 2011, Int. J. Intell. Inf. Technol..

[21]  Yugyung Lee,et al.  Semantic frameworks for document and ontology clustering , 2010 .

[22]  Zongmin Ma,et al.  Soft Computing in Ontologies and Semantic Web (Studies in Fuzziness and Soft Computing) , 2006 .

[23]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[24]  Paolo Tonella,et al.  Evaluation methods for Web application clustering , 2003, Fifth IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture. Proceedings..

[25]  Neepa Shah,et al.  Semantic based Document Clustering: A Detailed Review , 2012 .

[26]  M. Thangamani,et al.  Ontology Based Fuzzy Document Clustering Scheme , 2010 .

[27]  T.F. Gharib,et al.  Web document clustering approach using wordnet lexical categories and fuzzy clustering , 2008, 2008 11th International Conference on Computer and Information Technology.

[28]  Min Song,et al.  Handbook of Research on Text and Web Mining Technologies , 2008 .

[29]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[30]  Geoffrey Zhengfu Liu,et al.  The semantic vector space model (SVSM): a text representation and searching technique , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[31]  Ying Liu,et al.  On Document Representation and Term Weights in Text Classification , 2009 .

[32]  Mauro Dragoni,et al.  An Ontological Representation of Documents and Queries for Information Retrieval Systems , 2010, IIR.

[33]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[34]  M. K. Luhandjula Studies in Fuzziness and Soft Computing , 2013 .

[35]  E V Prasad,et al.  Text Document Clustering based on Semantics , 2012 .

[36]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[37]  Zongmin Ma Soft computing in ontologies and semantic web , 2006 .

[38]  Xiaoyue Wang,et al.  Extract Semantic Information from WordNet to Improve Text Classification Performance , 2010, AST/UCMA/ISA/ACN.

[39]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.