论文信息 - Domain Identification and Classification of Web Pages Using Artificial Neural Network

Domain Identification and Classification of Web Pages Using Artificial Neural Network

A huge amount of data has been made available on the WWW [3] lately most of which remains inaccessible to the usual Web crawlers as those web pages are generated dynamically in response to users queries through Web based search form interfaces [5, 6, 9]. A Hidden Web crawler must be able to automatically annotate such Hidden Web data. The goal can only be accomplished if the crawler has been provided with some knowledge or data that pertains to a domain similar to that of the search form interface. The paper seems to provide a solution in this regard by exploiting the information present in the HTML structure of the Web pages, efficiently obtaining domain specific data to facilitate the crawler’s access to the dynamic web pages through automatic processing of these search form interfaces. Finding the domain of the webpage further eases the process of organization and understanding of the web content.

Komal Kumar Bhatia | Sonali Gupta

[1] Neel Sundaresan,et al. A classifier for semi-structured documents , 2000, KDD '00.

[2] Martin Bergman,et al. The deep web:surfacing the hidden value , 2000 .

[3] Martin van den Berg,et al. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[4] Giuseppe Attardi,et al. Automatic Web Page Categorization by Link and Context Analysis , 1999 .

[5] John M. Pierre,et al. Practical Issues for Automated Categorization of Web Sites , 2000 .

[6] M. Indra Devi,et al. Feature Selection for Web Page Classification , 2009 .

[7] Brian D. Davison,et al. Knowing a web page by the company it keeps , 2006, CIKM '06.

[8] Marco Gori,et al. Focused Crawling Using Context Graphs , 2000, VLDB.

[9] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.

[10] Giles,et al. Searching the world wide Web , 1998, Science.

[11] Michael K. Bergman. White Paper: The Deep Web: Surfacing Hidden Value , 2001 .

[12] Komal Kumar Bhatia,et al. On The Automated Classification of Web Pages Using Artificial Neural Network , 2012 .