Accessing United States Bulk Patent Data with patentpy and patentr

The United States Patent and Trademark Office (USPTO) provides publicly accessible bulk data files containing information for all patents from 1976 onward. However, the format of these files changes over time and is memory-inefficient, which can pose issues for individual researchers. Here, we introduce the patentpy and patentr packages for the Python and R programming languages. They allow users to programmatically fetch bulk data from the USPTO website and access it locally in a cleaned, rectangular format. Research depending on United States patent data would benefit from the use of patentpy and patentr. We describe package implementation, quality control mechanisms, and present use cases highlighting simple, yet effective, applications of this software.

[1]  M. Salerno,et al.  Patent portfolio management: literature review and a proposed model , 2018, Expert opinion on therapeutic patents.

[2]  Jacob S. Sherkow Patent protection for microbial technologies. , 2017, FEMS microbiology letters.

[3]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[4]  Facundo Mémoli,et al.  Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition , 2007, PBG@Eurographics.

[5]  Péter Érdi,et al.  Recognition of emerging technology trends: class-selective study of citations in the U.S. Patent Citation Network , 2016, Scientometrics.

[6]  Hadley Wickham,et al.  testthat: Get Started with Testing , 2011, R J..

[7]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[8]  David Kuttenkeuler,et al.  When to file for a patent? The scientist's perspective. , 2020, New biotechnology.

[9]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Drew F. K. Williamson,et al.  TDAstats: R pipeline for computing persistent homology in topological data analysis , 2018, J. Open Source Softw..

[12]  Péter Érdi,et al.  From ranking and clustering of evolving networks to patent citation analysis , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[13]  Péter Érdi,et al.  Ranking Algorithms: Application for Patent Citation Network , 2019, Information Quality in Information Fusion and Decision Making.

[14]  Kohske Takahashi,et al.  Welcome to the Tidyverse , 2019, J. Open Source Softw..

[15]  Yann Ménière,et al.  International patent families: from application strategies to statistical indicators , 2017, Scientometrics.

[16]  Benjamin S. Baumer,et al.  Tidy data , 2022, Modern Data Science with R.

[17]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[18]  Peter Groves,et al.  International Patent Classification (IPC) , 2011 .