One of the most widely discussedtechnologiesare theInternet and itsassociated environment theWorldWide Web.Web technologyhasa broad popular supportamong entrepreneursand technicians likewise. The web environment is owned and managed by the corporation. It may be outsourced,but in most cases, the Web is a normal part of computer operations, and is often used as a center for the integration of business systems. An interaction occur when the Web create a transaction to execute a client order, for example. The transaction isformatted and sent to the corporate systems, where it is processed as any other order. In this sense, the Web is not just another source of business transactions that is entered.There is a decision support database, which is kept separated from the organization's operational data, which is Data Warehouse. This is a huge database containing historical data are summarized and strengthened. In fact, data warehouse provide the basis for the functioning of an e-commerce environment based on Web. This document focuses on two subjects – page relevance to a particular sphere and page contents for the explore keywords to advance the quality of URLs to be scheduled thereby avoiding irrelevant or low-quality ones. We need to build a vertical search engine that receives the seed URL and sorts URL-addresses to bypass content-based pages like to go to a specific area, such as medical or financial domains.
[1]
Stephen R. Gardner.
Building the data warehouse
,
1998,
CACM.
[2]
Thorsten Joachims,et al.
Optimizing search engines using clickthrough data
,
2002,
KDD.
[3]
Michael Chau,et al.
Comparison of Three Vertical Search Spiders
,
2003,
Computer.
[4]
Carlos Castillo,et al.
Effective web crawling
,
2005,
SIGF.
[5]
C. Lee Giles,et al.
Accessibility of information on the web
,
1999,
Nature.
[6]
Torben Bach Pedersen,et al.
Analyzing clickstreams using subsessions
,
2000,
DOLAP '00.