System of twice-gathering Information and Research of Information Fingerprint HashTrie

This paper presents a twice-gathering information interactive system prototype of e-government based on the condition that the Intranet and the Extranet are physical isolated. Users in the Extranet can gather links of the latest related information from client software which is previously collected by web alert in the Internet. Finally, through ferry-type transport devices, information is browsed by users in the Intranet, and it is transported to a storage device and synchronized with the web platform in the Intranet. During information gathering in the Extranet and data synchronization in the Intranet, it is essential to avoid repeated gathering and copying by means of comparing the extracted information fingerprints gathered from the web pages. This prototype uses HashTrie to store information fingerprints. During testing, the structure based on HashTrie is 2. 28 times faster than the Darts (double array Trie) which is the fastest structure in the existing applied patent. The existing 12 types of high speed Hash functions serving for HashTrie are also implemented. When the dictionary content is larger than 5×10^5 words, the PJWHash or the SuperFastHush function can be adopted; when the dictionary content is 10^5 words, CalcStrCR32 and ELFHash functions can be adopted.