论文信息 - Establishing guidelines on how to improve the Web site content based on the identification of representative pages

Establishing guidelines on how to improve the Web site content based on the identification of representative pages

The Internet has become a big battlefield where organizations are trying to keep their present clients and to gain new ones. Two important weapons that the organizations have are to make a good Web site design and to have a content interesting for the visitors. To improve the Web site content, many tools have been developed. However, it is hard to figure out how to apply these changes. Furthermore, in complex Web sites, this is a non trivial task. We propose a novel approach that helps to improve a Web site content using a SOFM and performing a reverse clustering analysis that allows us to gather the most representative Web pages from a Web site, using this small set of pages as a guideline of how these enhancements should be performed. The effectiveness of the method was tested in a real Web site.

[1] Hendrik Blockeel,et al. Web mining research: a survey , 2000, SKDD.

[2] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[3] Philip S. Yu,et al. Discovering Business Intelligence Information by Comparing Company Web Sites , 2003 .

[4] Koichi Takeda,et al. Information retrieval on the web , 2000, CSUR.

[5] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[6] Terumasa Aoki,et al. A methodology to find Web site keywords , 2004, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004.

[7] Andrzej Skowron,et al. Proceedings of the 2005 IEEE / WIC / ACM International Conference on Web Intelligence , 2005 .

[8] Peter D. Turney. Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[9] Sankar K. Pal,et al. Web mining in soft computing framework: relevance, state of the art and future directions , 2002, IEEE Trans. Neural Networks.

[10] Chris Buckley,et al. Automatic Text Summarization by Paragraph Extraction , 1997 .

[11] Terumasa Aoki,et al. Towards the Identification of Keywords in the Web Site Text Content: A Methodological Approach , 2005, Int. J. Web Inf. Syst..

[12] Inderjeet Mani,et al. Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.