Web Mining for Identifying Research Trends

This paper proposes a web mining approach for identifying research trends. The proposed approach comprises a number of data mining techniques. To perform web mining, the Indexing Agents search and download scientific publications from web sites that typically include academic web pages, then they extract citations and store them in a Web Citation Database. The Temporal Document Clustering technique and Journal Co-Citation Clustering technique are applied to the Web Citation Database to generate temporal document clusters and journal clusters respectively. The Multi-Clustering technique is then proposed to mine the document and journal clusters for their inter-relationships. Finally, the knowledge that is mined from the inter-relationships is used for the detection of trends and emergent trends for a specified research area. In this paper, we will discuss the proposed web mining approach, and the performance of the proposed approach.

[1]  William M. Pottenger,et al.  A Survey of Emerging Trend Detection in Textual Data Mining , 2004 .

[2]  Siu Cheung Hui,et al.  Mining a web citation database for document clustering , 2002, Appl. Artif. Intell..

[3]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[4]  William M. Pottenger,et al.  Detecting emerging concepts in textual data mining , 2001 .

[5]  Michael W. Berry,et al.  Computational information retrieval , 2001 .

[6]  Robert C. Kohberger,et al.  Cluster Analysis (3rd ed.) , 1994 .

[7]  S. C. Hui,et al.  Mining a Web Citation Database for author co-citation analysis , 2002, Inf. Process. Manag..

[8]  William M. Pottenger,et al.  Methodologies for Trend Detection in Textual Data Mining , 2005 .

[9]  C. Lee Giles,et al.  CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications , 1998, AGENTS '98.

[10]  C. Lee Giles,et al.  Clustering and identifying temporal trends in document databases , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[11]  Glenn D. Blank,et al.  CIMEL: constructive, collaborative inquiry-based multimedia E-learning. , 2001 .

[12]  Brian Everitt,et al.  Cluster analysis , 1974 .

[13]  David Jensen,et al.  TimeMines: Constructing Timelines with Statistical Models of Word Usage , 2000, KDD 2000.

[14]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[15]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.