Design and implementation of the patent topical web crawler system

In order to provide a knowledge source for the innovative design of the patent-based computer-aided products, a patent topical web crawler was designed and developed targeting at the patent information of the US Patent and Trademark Office (USPTO). In this paper, we describe the overall design and workflow of the patent topical crawler, including the basic functional architecture and key system technologies; propose the patent short text similarity calculation method based on Doc2Vec for the relevance discrimination of patent topic, which can effectively screen the required patent data. The experiment result shows that, this patent topical web crawler has high acquisition efficiency and applicability.

[1]  Ahmed Patel,et al.  Empirical evaluation of the link and content-based focused Treasure-Crawler , 2013, Comput. Stand. Interfaces.

[2]  Bin Sheng,et al.  Deep gesture interaction for augmented anatomy learning , 2019, Int. J. Inf. Manag..

[3]  Bin Sheng,et al.  Smart grid data mining and visualization , 2016, 2016 International Conference on Progress in Informatics and Computing (PIC).

[4]  Xueyuan Tan,et al.  A new extrapolation method for PageRank computations , 2017, J. Comput. Appl. Math..

[5]  Liu Chen,et al.  The Design and Implementation of Patent Information Acquiring and Analysis System , 2009 .

[6]  Naresh Kumar,et al.  Framework for Distributed Semantic Web Crawler , 2015, 2015 International Conference on Computational Intelligence and Communication Networks (CICN).

[7]  S. Arts,et al.  Text matching to measure patent similarity , 2018 .

[8]  Metin Bilgin,et al.  Sentiment analysis on Twitter data with semi-supervised Doc2Vec , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[9]  Jianfeng Hou,et al.  Course recommendation based on semantic similarity analysis , 2017, 2017 3rd IEEE International Conference on Control Science and Systems Engineering (ICCSSE).

[10]  Siu Cheung Kong,et al.  Evaluating a Bilingual Text-Mining System With a Taxonomy of Key Words and Hierarchical Visualization for Understanding Learner-Generated Text , 2018 .

[11]  Farshad Madani,et al.  The evolution of patent mining: Applying bibliometrics analysis and keyword network analysis , 2016 .

[12]  Janghyeok Yoon,et al.  Application technology opportunity discovery from technology portfolios: Use of patent classification and collaborative filtering , 2017 .

[13]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[14]  Sungjoo Lee,et al.  Discovering new technology opportunities based on patents: Text-mining and F-term analysis , 2017 .

[15]  Hao Liao,et al.  Ranking in evolving complex networks , 2017, ArXiv.

[16]  Atul Patel,et al.  Web Crawler : Review of Different Types of Web Crawler, Its Issues, Applications and Research Opportunities , 2017 .

[18]  G. Geetha,et al.  Smart distributed web crawler , 2016, 2016 International Conference on Information Communication and Embedded Systems (ICICES).

[19]  Qiang Wu,et al.  Retinal Vessel Segmentation Using Minimum Spanning Superpixel Tree Detector , 2019, IEEE Transactions on Cybernetics.

[20]  Mojtaba Zahedi Amiri,et al.  A Link Prediction Strategy for Personalized Tweet Recommendation through Doc2Vec Approach , 2017 .

[21]  Bin Sheng,et al.  Abdominal adipose tissues extraction using multi-scale deep neural network , 2017, Neurocomputing.