Semantic Web and Web Page Clustering Algorithms: A Landscape View

The major evolution of the semantic web has become exchanging data between applications in all domains of activities. Based on this vision, different applications in recent days, e.g. in the fields of community web portals, social networking, e-learning, multimedia retrieval, etc. have been designed. Due to growing number of web services, clustering of web resources becomes a valuable tool for semantic web mining. Clustering of internet objects like Internet web pages’ intimate new methods for grouping correlated content for better understanding and satisfies massive user query results in web pages’ search. Hence, web pages clustering algorithms should be able to handle massive irregular content and discover knowledge regardless of the web page complexity. These algorithms vary depending on the characteristics and data types. So, choosing the most appropriate algorithm is not an easy process as it should be accurate in terms of time and space complexity. Therefore, this paper rigorously surveys the most important algorithms of different types used for web page clustering. In addition, a comparative analysis of all such algorithms are provided in terms of several parameters. Finally, a brief discussion is provided on why web page clustering is important in emerging era of Semantic Web of Thing (SWoT) applications.

[1]  Jose Aguilar A web mining system , 2009 .

[2]  Farshad Fotouhi,et al.  Bipartite isoperimetric graph partitioning for data co-clustering , 2008, Data Mining and Knowledge Discovery.

[3]  Abraham Kandel,et al.  Graph-Theoretic Techniques for Web Content Mining , 2005, Series in Machine Perception and Artificial Intelligence.

[4]  Mauricio Barahona,et al.  Graph-based data clustering via multiscale community detection , 2020, Appl. Netw. Sci..

[5]  Aristides Gionis,et al.  k-means-: A Unified Approach to Clustering and Outlier Detection , 2013, SDM.

[6]  Erich Schubert,et al.  Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms , 2018, SISAP.

[7]  A. K. Jain,et al.  Data Clustering : A , 2007 .

[8]  Yaguo Lei,et al.  Intelligent Fault Diagnosis and Remaining Useful Life Prediction of Rotating Machinery , 2016 .

[9]  Yaguo Lei Clustering algorithm–based fault diagnosis , 2017 .

[10]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[11]  Kavita Sharma,et al.  Web mining: Today and tomorrow , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[12]  Sabhia Firdaus,et al.  A Survey on Clustering Algorithms and Complexity Analysis , 2015 .

[13]  Dipak Patel,et al.  A Review on Web Pages Clustering Techniques , 2011 .

[14]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[15]  Marjan Kuchaki Rafsanjani,et al.  A Survey Of Hierarchical Clustering Algorithms , 2012 .

[16]  Aarti Singh Agent Based Framework for Semantic Web Content Mining , 2012 .

[17]  Jacek Kitowski,et al.  Comparison of Latent Semantic Analysis and Probabilistic Latent Semantic Analysis for Documents Clustering , 2015, Comput. Informatics.

[18]  Hao Wang,et al.  Semantic data mining: A survey of ontology-based approaches , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[19]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[20]  Antonio F. Gómez-Skarmeta,et al.  Semantic Web of Things: an analysis of the application semantics for the IoT moving towards the IoT convergence , 2014, Int. J. Web Grid Serv..

[21]  Agnieszka Lawrynowicz Semantic Data Mining - An Ontology-Based Approach , 2017, Studies on the Semantic Web.

[22]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[23]  Abdelalim Sadiq,et al.  Semantic discovery architecture for dynamic environments of Web of Things , 2018, 2018 International Conference on Advanced Communication Technologies and Networking (CommNet).

[24]  Zhengming Ma,et al.  A Double-Density Clustering Method Based on "Nearest to First in" Strategy , 2020, Symmetry.

[25]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[26]  Pooja Batra Nagpal,et al.  Comparative Study of Density based Clustering Algorithms , 2011 .

[27]  Anja Jentzsch Linked Open Data Cloud , 2014 .

[28]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[29]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[30]  Anil Kumar Gupta,et al.  A Comparative study Between Fuzzy Clustering Algorithm and Hard Clustering Algorithm , 2014, ArXiv.

[31]  Mohamed S. Kamel,et al.  Efficient phrase-based document indexing for Web document clustering , 2004, IEEE Transactions on Knowledge and Data Engineering.

[32]  Jing Zhao,et al.  A Web Page Clustering Method Based on Formal Concept Analysis , 2018, Inf..

[33]  Teh Ying Wah,et al.  A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream , 2014, TheScientificWorldJournal.