A Design of a Sci-Tech Information Retrieval Platform Based on Apache Solr and Web Mining

In order to service the need of high-tech companies, allow companies get the sci-tech information more quickly and efficiently. The sci-tech information retrieval platform is proposed. The platform has four parts; the web spider, the Solr engine, the SQL Server 2008 database and the client. Each part deals a core issue, the mode make whole system more flexible, scalable and fault tolerant. The web spider collect sci-tech information from the Internet, the Solr engine takes charge of indexing documents gained by the web spider, the SQL Server database store all the users information and the configuration of the whole system, the client provides several REST-like APIs to modify the configurations and get the latest information in the platform.

[1]  Shengdong Li,et al.  Study on Efficiency of Full-Text Retrieval Based on Lucene , 2009, 2009 International Conference on Information Engineering and Computer Science.

[2]  Guanlin Chen,et al.  Design and Implementation of FTP Search Engine Based on Lucene , 2010, 2010 International Conference on Internet Technology and Applications.

[3]  Liu Shuang Design and Implementation of Face Recognition System Based on Embedded , 2012 .

[4]  Mengjia Xu,et al.  Research on application of Lucene in medical image retrieval system , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[5]  Dikshant Shahi Apache Solr: An Introduction , 2015 .