A hierarchical model of a FTP search engine with applications

Abstract Because the traditional FTP search engines usually adopt centralised spiders to collect data, insufficient temporal effectiveness is their major demerit. For solving this problem, this paper presents an efficient hierarchical FTP search engine model that deploys the spider agent on the node host of some specific network for collecting file data of FTP servers. The key technologies involve a regional responsibility mechanism, a search mechanism based on the asynchronous retrieval technology and a PAT Tree storage mechanism. The simulation shows that the responding time is less than that of the traditional system and the temporal effectiveness arrives at the application level. In addition, the test results show that the architecture has a good scalability.

[1]  Yan Hongfei,et al.  A dynamic reconfiguration model for a distributed web crawling system , 2001, Proceedings 2001 International Conference on Computer Networks and Mobile Computing.

[2]  Peter B. Danzig,et al.  The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..

[3]  Ming Lei,et al.  Digging for gold on the Web: experience with the WebGather , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[4]  Hongfei Yan,et al.  A Dynamically Reconfigurable Model for a Distributed Web Crawling System , 2001 .

[5]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.