Finding scientific papers with homepagesearch and MOPS

The fast dissemination of new research results on the world-wide web poses new challenges for search engines. In this paper we describe a new approach to seek scientific papers relevant to a pre-defined research area. Different from other approaches, we do not search for web pages which contain certain keywords, but we search for web pages which are created by scientists who are active in the research area under consideration. The names of these scientists are obtained from the DBLP server [9]. The HomePageSearch system finds the Home Pages according to the names, and Mops finds research papers close to the Home Pages. It creates an index of these papers and makes it accessible on the web. We conclude that such a focused crawling is very effective for building high-quality collections and indices of scientific papers, using ordinary desktop hardware.

[1]  Andrew McCallum,et al.  Using Reinforcement Learning to Spider the Web Efficiently , 1999, ICML.

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  Alf-Christian Ortyl Paul Achilles,et al.  The Collection of Computer Science Bibliographies , 1995 .

[4]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[5]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[6]  Udi Manber,et al.  GLIMPSE: A Tool to Search Through Entire File Systems , 1994, USENIX Winter.

[7]  Anil K. Jain,et al.  Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[8]  Oren Etzioni,et al.  Dynamic Reference Sifting: A Case Study in the Homepage Domain , 1997, Comput. Networks.