The Power of AI in IoT : Cognitive IoT-based Scheme for Web Spam Detection

In the modern era, Internet of Things(IoT) plays an important role in connecting the people across the globe. The IoT objects enable the communication and data exchange among each other irrespective of their geographical locations. In such an environment, the Web of Things (WoT) provides the Internet service to the IoT objects. The Internet is mostly accessed by the search engines. The success of search engine depends upon the ranking algorithm. Although, Google is preferred by the maximum Internet users, but still the Google’s ranking algorithm, PageRank experiences the occurrence of spam web pages. In this paper, the webpage filtering algorithm is proposed which automatically detects the spam web pages. The spam webpages are detected before these are processed by the ranking module of search engines. The machine learning model, i.e., decision tree is used for the validation of the proposed scheme. The ten fold cross validation approach is used to improve the accuracy of model, i.e., 98.2%. The results obtained demonstrate that the proposed scheme has the power of preventing the spam web pages in Cognitive Internet of Things (CIoT) environment.

[1]  Roberto Tempo,et al.  Distributed Randomized Algorithms for the PageRank Computation , 2010, IEEE Transactions on Automatic Control.

[2]  Andrei Z. Broder,et al.  Efficient PageRank approximation via graph aggregation , 2004, WWW Alt. '04.

[3]  Yu Huang,et al.  An advanced pre-positioning method for the force-directed graph visualization based on pagerank algorithm , 2015, Comput. Graph..

[4]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5]  Konstantin Avrachenkov,et al.  Monte Carlo Methods in PageRank Computation: When One Iteration is Sufficient , 2007, SIAM J. Numer. Anal..

[6]  Christopher Peter Lueg,et al.  Considering Collaborative Filtering as Groupware: Experiences and Lessons Learned , 1998, PAKM.

[7]  Bruno Grilhères,et al.  Combining classifiers for harmful document filtering , 2004, RIAO.

[8]  Balázs Csanád Csáji,et al.  PageRank optimization by edge selection , 2009, Discret. Appl. Math..

[9]  James Cheng,et al.  Fast PageRank approximation by adaptive sampling , 2013, Knowledge and Information Systems.

[10]  Qingtao Wu,et al.  Cognitive Internet of Things: Concepts and Application Example , 2012 .

[11]  Takuya Akiba,et al.  Computing Personalized PageRank Quickly by Exploiting Graph Structures , 2014, Proc. VLDB Endow..

[12]  Mohamed Elhadi,et al.  Duplicate Detection in Documents and WebPages Using Improved Longest Common Subsequence and Documents Syntactical Structures , 2009, 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology.

[13]  William R. Hersh,et al.  Filtering Web pages for quality indicators: an empirical approach to finding high quality consumer health information on the World Wide Web , 1999, AMIA.

[14]  Er-Wei Bai,et al.  A Web Aggregation Approach for Distributed Randomized PageRank Algorithms , 2012, IEEE Transactions on Automatic Control.

[15]  Berkant Barla Cambazoglu,et al.  Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation , 2011, IEEE Transactions on Parallel and Distributed Systems.

[16]  Chuanqing Gu,et al.  On the multi-splitting iteration method for computing PageRank , 2013 .

[17]  Neeraj Kumar,et al.  User behavior analysis-based smart energy management for webpage ranking: Learning automata-based solution , 2018, Sustain. Comput. Informatics Syst..