论文信息 - Web Document Clustering for Finding Expertise in Research Area

Web Document Clustering for Finding Expertise in Research Area

Researchers often need to find expertise in their chosen area of research. Finding expertise is very useful as relevant research papers can be studied and the experts could be identified. Therefore finding expertise in the chosen area of research has always attracted interest among academic community. These days research institutions and individual researchers make their publications and research findings available on web. With the exclusive growth of World Wide Web search engine users are overwhelmed by the huge volume of results returned in response to a simple query, which is far too large to get the desired knowledge. Therefore one of the methods of finding the expertise is by way of efficiently and accurately clustering the web documents, which enhances the integrity of web search engine. Data mining techniques matured making it possible to automate the web document clustering. In this paper, we present mutually exclusive Maximal Frequent Item set discovery based K- Means clustering approach. It has been implemented in JAVA. The common text processing approach is to convert the downloaded web documents into vectors. It is being done by extracting document features and it generates the document-feature data set. For a set of documents, the feature set is composed of all terms appearing in any one of the documents. We call this a document-feature data set. If document m contains feature n, then the corresponding value, in row n and column m of the table, is set to one. Otherwise, it is zero. Then, Apriori algorithm is applied to these document feature data set. The mutually exclusive frequent sets generated by Apriori algorithm are taken as initial points of K-Means algorithm. The output of the K- Means clustering algorithm will be the sets of highly related documents appearing together with same features. This approach enables the clustering of the web documents. It enables researchers to find the documents related to their desired area clustered and displayed together during the web search. It will significantly help them in terms of saving the time and getting all the relevant papers together in a cluster..

T. Jaya Lakshmi | Anil Kumar Pandey | Raj Kumar

[1] Hendrik Blockeel,et al. Web mining research: a survey , 2000, SKDD.

[2] Chiun-Chieh Hsu,et al. Unsupervised document clustering based on keyword clusters , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[3] Mohamed S. Kamel,et al. Enhanced document clustering using fusion of multiscale wavelet decomposition , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[4] Ling Zhuang,et al. A maximal frequent itemset approach for Web document clustering , 2004, The Fourth International Conference onComputer and Information Technology, 2004. CIT '04..

[5] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[6] Mohamed S. Kamel,et al. Efficient phrase-based document indexing for Web document clustering , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7] Manu Konchady. Text Mining Application Programming , 2006 .

[8] Siu Cheung Hui,et al. A Web mining approach for finding expertise in research areas , 2003, Proceedings. 2003 International Conference on Cyberworlds.

[9] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.