A personalized web page content filtering model based on segmentation

In the view of massive content explosion in World Wide Web through diverse sources, it has become mandatory to have content filtering tools. The filtering of contents of the web pages holds greater significance in cases of access by minor-age people. The traditional web page blocking systems goes by the Boolean methodology of either displaying the full page or blocking it completely. With the increased dynamism in the web pages, it has become a common phenomenon that different portions of the web page holds different types of content at different time instances. This paper proposes a model to block the contents at a fine-grained level i.e. instead of completely blocking the page it would be efficient to block only those segments which holds the contents to be blocked. The advantages of this method over the traditional methods are fine-graining level of blocking and automatic identification of portions of the page to be blocked. The experiments conducted on the proposed model indicate 88% of accuracy in filtering out the segments.

[1]  Jiuxin Cao,et al.  A segmentation method for web page analysis using shrinking and dividing , 2010, Int. J. Parallel Emergent Distributed Syst..

[2]  Wolfgang Nejdl,et al.  A densitometric approach to web page segmentation , 2008, CIKM '08.

[3]  Reihaneh Safavi-Naini,et al.  Web filtering using text classification , 2003, The 11th IEEE International Conference on Networks, 2003. ICON2003..

[4]  Wei-Ying Ma,et al.  VIPS: a Vision-based Page Segmentation Algorithm , 2003 .

[5]  Deepayan Chakrabarti,et al.  A graph-theoretic approach to webpage segmentation , 2008, WWW.

[6]  Ye Tian,et al.  Segmenting Webpage with Gomory-Hu Tree Based Clustering , 2011, J. Softw..

[7]  Justin Zobel,et al.  Effective ranking with arbitrary passages , 2001, J. Assoc. Inf. Sci. Technol..

[8]  Wanming Chu,et al.  VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia , 2011, DNIS.

[9]  Paul Resnick,et al.  PICS: Internet access controls without censorship , 1996, CACM.

[10]  Wei-Ying Ma,et al.  Block-based web search , 2004, SIGIR '04.

[11]  Zhouyu Fu,et al.  Recognition of Pornographic Web Pages by Classifying Texts and Images , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  曹建 网络巡警Cyber Patrol(上) , 2000 .

[13]  Jaeyoung Yang,et al.  Repetition-based web page segmentation by detecting tag patterns for small-screen devices , 2010, IEEE Transactions on Consumer Electronics.