A Web Page Clustering Method Based on Formal Concept Analysis

Web page clustering is an important technology for sorting network resources. By extraction and clustering based on the similarity of the Web page, a large amount of information on a Web page can be organized effectively. In this paper, after describing the extraction of Web feature words, calculation methods for the weighting of feature words are studied deeply. Taking Web pages as objects and Web feature words as attributes, a formal context is constructed for using formal concept analysis. An algorithm for constructing a concept lattice based on cross data links was proposed and was successfully applied. This method can be used to cluster the Web pages using the concept lattice hierarchy. Experimental results indicate that the proposed algorithm is better than previous competitors with regard to time consumption and the clustering effect.

[1]  Lc Freeman,et al.  USING GALOIS LATTICES TO REPRESENT NETWORK DATA , 1993 .

[2]  Peter W. Eklund,et al.  Concept Similarity and Related Categories in SearchSleuth , 2008, ICCS.

[3]  Wei-Ying Ma,et al.  Extracting Content Structure for Web Pages Based on Visual Representation , 2003, APWeb.

[4]  Chengcui Zhang,et al.  An FAR-SW based approach for webpage information extraction , 2014, Inf. Syst. Frontiers.

[5]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[6]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[7]  Sergei O. Kuznetsov,et al.  Comparing performance of algorithms for generating concept lattices , 2002, J. Exp. Theor. Artif. Intell..

[8]  Jonas Poelmans,et al.  Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research , 2012, Industrial Conference on Data Mining.

[9]  Rudolf Wille,et al.  Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts , 2009, ICFCA.

[10]  Giansalvatore Mecca,et al.  A new algorithm for clustering search results , 2007, Data Knowl. Eng..

[11]  László Kovács Efficiency analsyis of concept lattice construction algorithms , 2018 .

[12]  Rokia Missaoui,et al.  INCREMENTAL CONCEPT FORMATION ALGORITHMS BASED ON GALOIS (CONCEPT) LATTICES , 1995, Comput. Intell..

[13]  Oren Etzioni,et al.  Clustering web documents: a phrase-based method for grouping search engine results , 1999 .

[14]  Jong-Hyeok Lee,et al.  Web page classification based on k-nearest neighbor approach , 2000, IRAL '00.

[15]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[16]  Ke Zhang,et al.  Intelligent Search Engine Based on Formal Concept Analysis , 2007 .

[17]  Haizhou Li,et al.  Chinese Word Segmentation , 1998, PACLIC.

[18]  Madalina Croitoru,et al.  Proceedings of the 18th international conference on Conceptual structures: from information to intelligence , 2010 .

[19]  Chris H. Q. Ding,et al.  Automatic topic identification using webpage clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[20]  Pascal Hitzler What's Happening in Semantic Web - ... and What FCA Could Have to Do with It , 2011, ICFCA.

[21]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[22]  Chen En Web Information Extraction , 2003 .

[23]  Sergei O. Kuznetsov,et al.  Algorithms for the Construction of Concept Lattices and Their Diagram Graphs , 2001, PKDD.

[24]  Hao Zhang,et al.  A fast incremental algorithm for deleting objects from a concept lattice , 2015, Knowl. Based Syst..

[25]  Chen Qing-yan Improvement on Bordat algorithm for constructing concept lattice , 2010 .

[26]  Shih-Ting Yang A Webpage Classification Algorithm Concerning Webpage Design Characteristics , 2012, Int. J. Electron. Bus. Manag..

[27]  Ramez Elmasri,et al.  Web data cleansing and preparation for ontology extraction using WordNet , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[28]  Rokia Missaoui,et al.  Formal Concept Analysis for Knowledge Discovery and Data Mining: The New Challenges , 2004, ICFCA.

[29]  XiuYing Sun Construction Data Mining Information Management System Based on FCA and Ontology , 2012 .