Application of Heritrix in Vertical Search Platform of Electronic Information

This paper mainly introduces the basic concepts of the vertical search engine and web crawler,and describes the architecture of Heritrix system.The Heritrix workflow is analyzed.Aiming at some imperfections in Heritirx,our project designs how to grab directionally a certain type of information.The ELFHash algorithm is introduced.The multi-threaded crawl of information in the telecommunications information search platform is realized by extending the Heritrix to provide information source for the establishment of a vertical search engine for electronic information.