Many of the text mining applications contain a huge amount of information from document in the form of text. This text can be very helpful for Text Clustering. This text also includes various kind of other information known as Side Information or Metadata. Examples of this side information include links to other web pages, title of the document, author name or date of Publication which are present in the text document. Such metadata may possess a lot of information for the clustering purposes. But this Side information may be sometimes noisy. Using such Side Information for producing clusters without filtering it, can result to bad quality of Clusters. So we use an efficient Feature Selection method to perform the mining process to select that Side Information which is useful for Clustering so as to maximize the advantages from using it. The proposed technique, CCSI (Co-Clustering with Side Information) system makes use of the process of Co-Clustering or Two-mode clustering which is a data mining technique that allows concurrently clustering of the rows and columns of a matrix.
[1]
S Saranya,et al.
A Survey on Improving the Clustering Performance in Text Mining for Efficient Information Retrieval
,
2014
.
[2]
Shi Gao,et al.
Text clustering based on the improved TFIDF by the iterative algorithm
,
2012,
2012 IEEE Symposium on Electrical & Electronics Engineering (EEESYM).
[3]
Jiawei Han,et al.
Efficient and Effective Clustering Methods for Spatial Data Mining
,
1994,
VLDB.
[4]
Renu Dhir,et al.
A Frequent Concepts Based Document Clustering Algorithm
,
2010
.
[5]
Philip S. Yu,et al.
On the Use of Side Information for Mining Text Data
,
2014,
IEEE Transactions on Knowledge and Data Engineering.
[6]
George Karypis,et al.
A Comparison of Document Clustering Techniques
,
2000
.
[7]
Tian Zhang,et al.
BIRCH: an efficient data clustering method for very large databases
,
1996,
SIGMOD '96.