The System Architecture for the Basic Information of Science and Technology Experts Based on Distributed Storage and Web Mining

In order to build an efficient basic information system of science and technology experts based on Web mining, a novel system architecture for application is proposed in this paper. The proposed system architecture integrates spider module, local distributed storage and Mongo-DB. The basic experts information of science and technology appeared in the Websites are synthesized as two format and using two strategies to deal with it respectively. The normalized texts which extracted from Web page by URLs are suggested. The extracted results include the name, sex, birth, hometown and professional title of science and technology experts respectively. The data stream flow, the information management model for the users and science and technology experts, the target website URLs and URLs management model, and data processing module are introduced in detailed. The synchronization of multiple databases and replica sets architecture for sharing cluster architecture is proposed in application system. Experiments show that the application system obtains a very high efficiency. The results show as by proposed system architecture can satisfy the application requirements for the customer.

[1]  Mladen A. Vouk,et al.  Using VCL technology to implement distributed reconfigurable data centers and computational services for educational institutions , 2009, IBM J. Res. Dev..

[2]  Yunpeng Cai The research and application of SaaS in educational Information system based on educational metropolitan area network , 2010, 2010 International Conference on Educational and Information Technology.

[3]  Xiangyu Li An integration framework for information system based on web service , 2010, 2010 2nd IEEE International Conference on Information Management and Engineering.

[4]  Quanyin Zhu,et al.  Commodities Price Dynamic Trend Analysis Based on Web Mining , 2011, 2011 Third International Conference on Multimedia Information Networking and Security.

[5]  Quanyin Zhu,et al.  The case study for price extracting of mobile phone sell online , 2011, 2011 IEEE 2nd International Conference on Software Engineering and Service Science.

[6]  Yu Zhang,et al.  The commodities price extracting for shop online , 2010, 2010 International Conference on Future Information Technology and Management Engineering.

[7]  Weixiang Xu,et al.  Dynamic metadata query algorithm of high-speed railway basic data based on heterogeneous integration , 2011, Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology.

[8]  Rong Li,et al.  Web Information Extraction Based on Hybrid Conditional Model , 2010, 2010 Second International Workshop on Education Technology and Computer Science.

[9]  Xinchao Han,et al.  Research on Web information extraction based on spider algorithm and DOM thinking , 2010, 2010 International Conference on Information, Networking and Automation (ICINA).

[10]  Achim Streit,et al.  A Federated Data Zone for the Arts and Humanities , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[11]  Jin Qian,et al.  Research on the New Products Discovery Based on Web Mining , 2011, 2011 Third International Conference on Multimedia Information Networking and Security.

[12]  Shaohua Zhang,et al.  Study on service-oriented security architecture , 2011, 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference.

[13]  Yi Liu,et al.  Web information extraction based on hidden Markov model , 2010, The 2010 14th International Conference on Computer Supported Cooperative Work in Design.

[14]  Shan Lin,et al.  Design and Implementation of a Web Information Extraction System Based on  R-G-B Algorithm , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[15]  M. S. Othman,et al.  The architecture of scientific computing library portal on sequential algorithm based on partial differential equation , 2010, 2010 International Symposium on Information Technology.