Topic based classification and pattern identification in patents

Patent classification systems and citation networks are used extensively in innovation studies. However, non-unique mapping of classification codes onto specific products/markets and the difficulties in accurately capturing knowledge flows based just on citation linkages present limitations to these conventional patent analysis approaches. We present a natural language processing based hierarchical technique that enables the automatic identification and classification of patent datasets into technology areas and sub-areas. The key novelty of our technique is to use topic modeling to map patents to probability distributions over real world categories/topics. Accuracy and usefulness of our technique are tested on a dataset of 10,201 patents in solar photovoltaics filed in the United States Patent and Trademark Office (USPTO) between 2002 and 2013. We show that linguistic features from topic models can be used to effectively identify the main technology area that a patent's invention applies to. Our computational experiments support the view that the topic distribution of a patent offers a reduced-form representation of the knowledge content in a patent. Accordingly, we suggest that this hidden thematic structure in patents can be useful in studies of the policy–innovation–geography nexus. To that end, we also demonstrate an application of our technique for identifying patterns in technological convergence.

[1]  Nathan Rosenberg,et al.  Exploring the black box: Telecommunications: complex, uncertain, and path dependent , 1994 .

[2]  Martin G. Moehrle,et al.  Evaluating the Risk of Patent Infringement by Means of Semantic Patent Analysis: The Case of DNA Chips , 2008 .

[3]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[4]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  M. Woodhouse,et al.  Residential, Commercial, and Utility-Scale Photovoltaic (PV) System Prices in the United States: Current Drivers and Cost-Reduction Opportunities , 2012 .

[6]  Jacob Schmookler,et al.  Invention and Economic Growth , 1967 .

[7]  Fulvio Corno,et al.  Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics , 2010 .

[8]  J. Weyant Accelerating the development and diffusion of new energy technologies: Beyond the “valley of death” , 2011 .

[9]  Mu-Hsuan Huang,et al.  Identifying and visualizing technology evolution: A case study of smart grid technology , 2012 .

[10]  David Popp,et al.  Where Does Energy R&D Come from? Examining Crowding Out from Environmentally-Friendly R&D , 2009 .

[11]  A. Törcsvári,et al.  Automated categorization in the international patent classification , 2003, SIGF.

[12]  Leah S. Larkey,et al.  A patent search and classification system , 1999, DL '99.

[13]  David C. Mowery,et al.  Paths of Innovation: Technological Change in 20th-Century America , 1998 .

[14]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[15]  José Lobo,et al.  Using patent technology codes to study technological change , 2012 .

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Sheryl Winston Smith,et al.  Do innovative users generate more useful insights? An analysis of corporate venture capital investments in the medical device industry , 2013 .

[18]  C. J. Fall,et al.  Computer-Assisted Categorization of Patent Documents in the International Patent Classification , 2003 .

[19]  G. Nemet Demand-pull, technology-push, and government-led incentives for non-incremental technical change , 2009 .

[20]  Yongtae Park,et al.  Monitoring trends of technological changes based on the dynamic patent lattice: A modified formal concept analysis approach , 2011 .

[21]  V. Hoffmann,et al.  The impact of technology-push and demand-pull policies on technical change – Does the locus of policies matter? , 2012 .

[22]  W. Brian Arthur,et al.  The structure of invention , 2007 .

[23]  Kristina Dahlin,et al.  When is an Invention Really Radical? Defining and Measuring Technological Radicalness , 2005 .

[24]  Gaetano Cascini,et al.  Natural Language Processing of Patents and Technical Documentation , 2004, Document Analysis Systems.

[25]  Varun Rai,et al.  Solar Valuation and the Modern Utility's Expansion into Distributed Generation , 2014 .

[26]  Akira Goto,et al.  Patent Statistics as an Innovation Indicator , 2010 .

[27]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[28]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[29]  Yongtae Park,et al.  A patent-based cross impact analysis for quantitative estimation of technological impact: The case of information and communication technology , 2007 .

[30]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[31]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[32]  Quan Li,et al.  A Review of the Single Phase Photovoltaic Module Integrated Converter Topologies With Three Different DC Link Configurations , 2008, IEEE Transactions on Power Electronics.

[33]  F. Blaabjerg,et al.  A review of single-phase grid-connected inverters for photovoltaic modules , 2005, IEEE Transactions on Industry Applications.

[34]  Ivan Haščič,et al.  Technology and the diffusion of renewable energy , 2011 .

[35]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[36]  Deepak Hegde,et al.  Examiner Citations, Applicant Citations, and the Private Value of Patents , 2009 .

[37]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[38]  David Popp,et al.  Energy, the Environment, and Technological Change , 2009 .

[39]  Byungun Yoon,et al.  A systematic approach for identifying technology opportunities: Keyword-based morphology analysis , 2005 .

[40]  Mark A. Schankerman,et al.  Patent Quality and Research Productivity: Measuring Innovation with Multiple Indicators , 2004 .

[41]  M. Glachant,et al.  Does Foreign Environmental Policy Influence Domestic Innovation? Evidence from the Wind Industry , 2014 .

[42]  David Popp,et al.  International Innovation and Diffusion of Air Pollution Control Technologies: The Effects of Nox and So2 Regulation in the Us, Japan, and Germany , 2004 .

[43]  J. Davidson Frame,et al.  Measuring national technological performance with patent claims data , 1994 .

[44]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[45]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[46]  Harold Smith Automation of patent classification , 2002 .

[47]  Péter Érdi,et al.  Prediction of emerging technologies based on analysis of the US patent citation network , 2012, Scientometrics.

[48]  Dirk Thorleuchter,et al.  A compared R&D-based and patent-based cross impact analysis for identifying relationships between technologies , 2010 .

[49]  M. Gittelman,et al.  Applicant and Examiner Citations in US Patents: An Overview and Analysis , 2008 .

[50]  Chihiro Watanabe,et al.  A new dimension of potential resources in innovation: A wider scope of patent claims can lead to new functionality development , 2006 .

[51]  Yongtae Park,et al.  Monitoring the organic structure of technology based on the patent development paths , 2009 .

[52]  Tao Huang,et al.  Patent classification system using a new hybrid genetic algorithm support vector machine , 2010, Appl. Soft Comput..

[53]  Bronwyn H Hall,et al.  Market value and patent citations , 2005 .

[54]  Mark A. Lemley,et al.  The Growing Complexity of the United States Patent System - eScholarship , 2001 .

[55]  D. Harhoff,et al.  Citation Frequency and the Value of Patented Inventions , 1999, Review of Economics and Statistics.

[56]  Gregory F. Nemet,et al.  Innovation in the U.S. building sector: An assessment of patent citations in building energy control technology , 2013 .

[57]  Gregory F. Nemet,et al.  Inter-technology knowledge spillovers for energy technologies , 2012 .

[58]  Martin G. Moehrle,et al.  A new instrument for technology monitoring: novelty in patents measured by semantic patent analysis , 2012, Scientometrics.

[59]  Ashoka Mody,et al.  Innovation and the international diffusion of environmentally responsive technology , 1996 .

[60]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[61]  Yann Ménière,et al.  Innovation and international technology transfer: The case of the Chinese photovoltaic industry , 2011 .

[62]  Manuel Trajtenberg,et al.  THE NBER/SLOAN PROJECT ON INDUSTRIAL TECHAIOLOGY AND PRODUCTIVITY: INCORPORATING LEARNING FROM PLANT VISITS AND INTERVIEWS INTO ECONOMIC RESEARCHt Knowledge Spillovers and Patent Citations: Evidence from a Survey of Inventors , 2000 .

[63]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[64]  A. Jaffe Technological Opportunity and Spillovers of R&D: Evidence from Firms&Apos; Patents, Profits and Market Value , 1986 .