CaSePer: An efficient model for personalized web page change detection based on segmentation

Abstract Users who visit a web page repeatedly at frequent intervals are more interested in knowing the recent changes that have occurred on the page than the entire contents of the web page. Because of the increased dynamism of web pages, it would be difficult for the user to identify the changes manually. This paper proposes an enhanced model for detecting changes in the pages, which is called CaSePer (Change detection based on Segmentation with Personalization). The change detection is micro-managed by introducing web page segmentation. The web page change detection process is made efficient by having it perform a dual-step process. The proposed method reduces the complexity of the change-detection by focusing only on the segments in which the changes have occurred. The user-specific personalized change detection is also incorporated in the proposed model. The model is validated with the help of a prototype implementation. The experiments conducted on the prototype implementation confirm a 77.8% improvement and a 97.45% accuracy rate.

[1]  Jer Lang Hong,et al.  Webpage segmentation for extracting images and their surrounding contextual information , 2009, MM '09.

[2]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  Stefan Conrad,et al.  Measuring performance of web image context extraction , 2010, MDMKDD '10.

[4]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[5]  Andreas Paepcke,et al.  Accordion summarization for end-game browsing on PDAs and cellular phones , 2001, CHI.

[6]  Hassan Artail,et al.  A fast HTML web page change detection approach based on hashing and reducing the number of similarity computations , 2008, Data Knowl. Eng..

[7]  Wei-Ying Ma,et al.  VIPS: a Vision-based Page Segmentation Algorithm , 2003 .

[8]  Deepayan Chakrabarti,et al.  A graph-theoretic approach to webpage segmentation , 2008, WWW.

[9]  Fred Douglis,et al.  Tracking and Viewing Changes on the Web , 1996, USENIX Annual Technical Conference.

[10]  Christoph Reinke,et al.  Web Contents Tracking by Learning of Page Grammars , 2008, 2008 Third International Conference on Internet and Web Applications and Services.

[11]  Robin La Fontaine A Delta Format for XML: Identifying Changes in XML Files and Representing the Changes in XML , 2005 .

[12]  Fred Douglis,et al.  The AT&T Internet Difference Engine: Tracking and viewing changes on the web , 1998, World Wide Web.

[13]  Sharma Chakravarthy,et al.  WebVigil: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments , 2002, WebDyn@WWW.

[14]  David J. DeWitt,et al.  X-Diff: an effective change detection algorithm for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  Calton Pu,et al.  WebCQ-detecting and delivering information changes on the web , 2000, CIKM '00.

[16]  Wolfgang Nejdl,et al.  A densitometric approach to web page segmentation , 2008, CIKM '08.

[17]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[18]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[19]  Jiuxin Cao,et al.  A segmentation method for web page analysis using shrinking and dividing , 2010, Int. J. Parallel Emergent Distributed Syst..

[20]  G. Aghila,et al.  An Ontology Based Model for User Profile Building Using Web Page Segment Evaluation , 2012, ACITY.

[21]  Hamid Hassanpour,et al.  An adaptive meta-search engine considering the user's field of interest , 2012, J. King Saud Univ. Comput. Inf. Sci..

[22]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[23]  Stéphane Gançarski,et al.  Vi-DIFF: Understanding Web Pages Changes , 2010, DEXA.

[24]  Elio Masciari,et al.  Efficient and effective Web change detection , 2003, Data Knowl. Eng..

[25]  Yeliz Yesilada,et al.  Web Page Segmentation: A Review , 2014 .