Anchor point indexing in Web document retrieval

Traditional World Wide Web search engines, such as AltaVista.com, index and recommend individual Web pages to assist users in locating relevant documents. As the Web grows, however, the number of matching pages increases at a tremendous rate. Users are often overwhelmed by the large answer set recommended by the search engines. Also, if a matching document is a hypertext, the document structure is destroyed and the individual pages that compose the document are returned instead. The logical starting point of the hyperdocument is thus hidden among the large basket of matching pages. Users need to spend a lot of effort browsing through the pages to locate the starting point, a very time consuming process. This paper studies the anchor point indexing problem. The set of anchor points of a given user query is a small set of key pages from which the larger set of documents that are relevant to the query can be easily reached. The use of anchor points helps solve the problems of huge answer set and low precision suffered by most search engines by considering the hyperlink structures of the relevant documents, and by providing a summary view of the result set.

[1]  Daniel S. Weld,et al.  Intelligent Agents on the Internet: Fact, Fiction, and Forecast , 1995, IEEE Expert.

[2]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[3]  Susan Feldman,et al.  "Just the Answers, Please": Choosing a Web Search Service. , 1997 .

[4]  Hsinchun Chen,et al.  Knowledge-based document retrieval: framework and design , 1992, J. Inf. Sci..

[5]  Dik Lun Lee,et al.  Document Ranking and the Vector-Space Model , 1997, IEEE Softw..

[6]  David Wai-Lok Cheung,et al.  Intelligent agents for matching information providers and consumers on the World-Wide-Web , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[7]  Luis Gravano,et al.  The Effectiveness of GlOSS for the Text Database Discovery Problem , 1994, SIGMOD Conference.

[8]  Yoav Shoham,et al.  Learning Information Retrieval Agents: Experiments with Automated Web Browsing , 1995 .

[9]  David Wai-Lok Cheung,et al.  Recommending anchor points in structure-preserving hypertext document retrieval , 1998, Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241).

[10]  Jakob Nielsen,et al.  The art of navigating through hypertext , 1990, CACM.

[11]  K. J. Lynch,et al.  Automatic construction of networks of concepts characterizing document databases , 1992, IEEE Trans. Syst. Man Cybern..

[12]  Ben Kao,et al.  Title Anchor point indexing in web document retrieval , 2000 .

[13]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[14]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[15]  Vijay V. Raghavan,et al.  Information Retrieval on the World Wide Web , 1997, IEEE Internet Comput..

[16]  Luis Gravano,et al.  The Efficacy of GlOSS for the Text Database Discovery Problem , 1993, SIGMOD 1993.

[17]  Reinier Post,et al.  Information Retrieval in the World-Wide Web: Making Client-Based Searching Feasible , 1994, Comput. Networks ISDN Syst..

[18]  Luis Gravano,et al.  Precision and recall of GlOSS estimators for database discovery , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[19]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.