Robust web page segmentation for mobile terminal using content-distances and page layout information

The demand of browsing information from general Web pages using a mobile phone is increasing. However, since the majority of Web pages on the Internet are optimized for browsing from PCs, it is difficult for mobile phone users to obtain sufficient information from the Web. Therefore, a method to reconstruct PC-optimized Web pages for mobile phone users is essential. An example approach is to segment the Web page based on its structure, and utilize the hierarchy of the content element to regenerate a page suitable for mobile phone browsing. In our previous work, we have examined a robust automatic Web page segmentation scheme which uses the distance between content elements based on the relative HTML tag hierarchy, i.e., the number and depth of HTML tags in Web pages. However, this scheme has a problem that the content-distance based on the order of HTML tags does not always correspond to the intuitional distance between content elements on the actual layout of a Web page. In this paper, we propose a hybrid segmentation method which segments Web pages based on both the content-distance calculated by the previous scheme, and a novel approach which utilizes Web page layout information. Experiments conducted to evaluate the accuracy of Web page segmentation results prove that the proposed method can segment Web pages more accurately than conventional methods. Furthermore, implementation and evaluation of our system on the mobile phone prove that our method can realize superior usability compared to commercial Web browsers.

[1]  Brad A. Myers,et al.  WebThumb: interaction techniques for small-screen browsers , 2002, UIST '02.

[2]  George Buchanan,et al.  Improving mobile internet usability , 2001, WWW '01.

[3]  Pedro M. Domingos,et al.  Personalizing web sites for mobile users , 2001, WWW '01.

[4]  Toshiaki Uemukai,et al.  Content description and partitioning methods for collaborative browsing by multiple mobile users , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[5]  Hidetaka MASUDA,et al.  Recognition of HTML Table Structure , 2004 .

[6]  Wei-Ying Ma,et al.  Detecting web page structure for adaptive viewing on small form factor devices , 2003, WWW '03.

[7]  Xing Xie,et al.  Improving Web Browsing on Small Devices Based on Table Classification , 2004, PCM.

[8]  Natasa Milic-Frayling,et al.  SmartView: Enhanced Document Viewer for Mobile Devices , 2002 .

[9]  Andreas Paepcke,et al.  Power browser: efficient Web browsing for PDAs , 2000, CHI.

[10]  Patrick Baudisch,et al.  Summary thumbnails: readable overviews for small screen web browsers , 2005, CHI.

[11]  George Buchanan,et al.  Sorting Out Searching on Small Screen Devices , 2002, Mobile HCI.

[12]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[13]  Xing Xie,et al.  Collapse-to-zoom: viewing web pages on small screen devices by interactively removing irrelevant content , 2004, UIST '04.

[14]  Takahiro Hara,et al.  A collaborative Web browsing system for multiple mobile users , 2004, Fourth Annual IEEE International Conference on Pervasive Computing and Communications (PERCOM'06).

[15]  Kerry Rodden,et al.  SearchMobil: Web Viewing and Search for Mobile Devices , 2003, WWW.

[16]  Shumeet Baluja,et al.  Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework , 2006, WWW '06.