The Spanish Web in Numbers - Main Features of the Spanish Hidden Web
暂无分享,去创建一个
This article submits a study about the web sites of the “.es” domains which focuses on the level of use of the technologies that hinder the traversal of the Web to the crawling systems. The study is centred on HTML scripts and forms, since they are two well-known entry points to the “Hidden Web”. For the case of scripts, it pays special attention to redirection and dynamic construction of URLs. The article concludes that a crawler should process those technologies in order to obtain most of the documents of the Web.
[1] Melius Weideman,et al. The influence that JavaScript™ has on the visibility of a Website to search engines - a pilot study , 2006, Inf. Res..
[2] Mitesh Patel,et al. Structured databases on the web: observations and implications , 2004, SGMD.
[3] Hector Garcia-Molina,et al. Web Spam Taxonomy , 2005, AIRWeb.
[4] Brian D. Davison,et al. Cloaking and Redirection: A Preliminary Study , 2005, AIRWeb.
[5] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .