Applying an information gathering architecture to Netfind: a white pages tool for a changing and growing Internet

The Internet is quickly becoming an indispensable means of communication and collaboration, based on applications such as electronic mail, remote information retrieval, and multimedia conferencing. A fundamental problem for such applications is supporting resource discovery in a fashion that keeps pace with the Internet's exponential growth in size and diversity. Netfind is a scalable tool that locates current electronic mail addresses and other information about Internet users. Since the time we first deployed Netfind in 1990, it has evolved considerably, making use of more types of information sources. As well as more sophisticated mechanisms to gather and cross-correlate information. In this paper, we describe these techniques, and present a general framework for gathering and harnessing widely distributed information in a diverse and growing Internet environment. At present, Netfind gathers information from 17 different types of sources, providing a particularly thorough demonstration of an information gathering architecture. >

[1]  David P. Zimmerman The Finger User Information Protocol , 1991, RFC.

[2]  Jon Postel,et al.  Simple Mail Transfer Protocol , 1981, RFC.

[3]  Jim Fullton,et al.  Architecture of the Whois++ Index Service , 1996, RFC.

[4]  Jon Postel,et al.  White Pages Meeting Report , 1994, RFC.

[5]  Ken Harrenstien,et al.  Nicname/whois , 1982, RFC.

[6]  H. Silfverhielm,et al.  Sweden , 1996, The Lancet.

[7]  Peter B. Danzig,et al.  Scalable Internet resource discovery: research problems and approaches , 1994, CACM.

[8]  PuCalton,et al.  Applying an information gathering architecture to Netfind , 1994 .

[9]  Darren R. Hardy,et al.  Essence: A Resource Discovery System Based on Semantic File Indexing , 1993, USENIX Winter.

[10]  Pierre Jouvelot,et al.  Semantic file systems , 1991, SOSP '91.

[11]  Smoot Carl-Mitchell,et al.  The internet connection - system connectivity and configuration , 1993 .

[12]  Mark A. Sheldon,et al.  Content Routing for Distributed Information Servers , 1994, EDBT.

[13]  John S. Quarterman,et al.  The Matrix: Computer Networks and Conferencing Systems Worldwide , 1989 .

[14]  C. Dharap,et al.  Type structured file system , 1993, Proceedings Third International Workshop on Object Orientation in Operating Systems.

[15]  Roger M. Needham,et al.  Experience with Grapevine: the growth of a distributed system , 1984, TOCS.

[16]  Paul V. Mockapetris,et al.  Domain names: Concepts and facilities , 1983, RFC.

[17]  Mark K. Lottor Internet Growth (1981-1991) , 1992, RFC.

[18]  Michael Schwartz Resource Discovery and Privacy , 1993 .

[19]  Tim Berners-Lee,et al.  Uniform Resource Locators , 1994 .

[20]  V. Jagannathan,et al.  Blackboard Architectures and Applications , 1989 .

[21]  Michel Gien,et al.  A File Transfer Protocol (FTP) , 1978, Comput. Networks.

[22]  Michael F. Schwartz,et al.  Fremont: A System for Discovering Network Characteristics and Problems , 1993, USENIX Winter.

[23]  Darren R. Hardy,et al.  Customized information extraction as a basis for resource discovery , 1996, TOCS.

[24]  Bruce Walker,et al.  The LOCUS distributed operating system , 1983, SOSP '83.

[25]  Brewster Kahle,et al.  An information system for corporate users: wide area information servers , 1991 .

[26]  Peter B. Danzig,et al.  Harvest: A Scalable, Customizable Discovery and Access System , 1994 .

[27]  B. Clifford Neuman,et al.  A Comparison of Internet Resource Discovery Approaches , 1992, Comput. Syst..

[28]  Michael F. Schwartz,et al.  The Changing Global Internet Service Infrastructure , 1993 .

[29]  Oliver A. McBryan,et al.  GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.

[30]  John Quarterman,et al.  The Internet Connection , 1994 .

[31]  David L. Mills,et al.  Network Time Protocol (NTP) , 1985, RFC.