Semantic Oriented Text Clustering Based on RDF

Text clustering is the discipline that purports to find related groups in a collection of documents. Based on text clustering the use of documents can be more salubrious. Researchers have used various methods to implement text clustering either agglomerative, divisive, or itemsets-based clustering. Most of these proposed approaches do not take into account the semantic relationships between words, in this case, the documents are considered only as bags of unrelated words. Our work aims to consider the semantics of the text phrases in the clustering task, and to get full usage and exploitation of documents. The semantic web concept is overloaded with valuable techniques allowing the significant use of documents. Our goal is to take full advantage of these techniques. Using the Resource Description Framework (RDF) to represent textual data as triplets. They provide a semantic representation of data on which the clustering process will be based, to provide a more efficient clustering system. On the other hand, and based on the clustering process, we opt on incorporating other techniques such as ontology representation using RDF, RDF Schemas (RDFS), and Web Ontology Language (OWL) to manipulate and extract meaningful information. In this paper, we propose a framework of semantic oriented text clustering based on RDF by the means of a semantic similarity measure, and we highlight the benefits of using semantic web techniques in clustering, topic modeling, and information extraction based on questioning, reasoning and inferencing processes.