Measuring the drafting alignment of patent documents using text mining

How would an inventor, entrepreneur, investor, or patent examiner quantify the extent to which the inventive claims listed in a patent document align with patent specification? Since a specification that is poorly aligned with the inventive claims can render an invention unpatentable and can invalidate an already issued patent, an effective measure of alignment is necessary. We define a novel measure of drafting alignment using Latent Dirichlet Allocation (LDA). The measure is defined for each patent document by first identifying the latent topics underlying the claims and the specification, and then using the Hellinger distance to find the proximity between the topical coverages. We demonstrate the use of the novel measure for data processing patent documents related to cybersecurity. The properties of the proposed measure are further investigated using exploratory data analysis, and it is shown that generally alignment is positively associated with the prior patenting efforts as well as the tendency to include figures in a document.

[1]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[2]  J. Peto,et al.  Asymptotically Efficient Rank Invariant Test Procedures , 1972 .

[3]  Mark A. Lemley Rational Ignorance at the Patent Office , 2001 .

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Dietrich Klakow,et al.  Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[6]  Richard L. Kissel Glossary of Key Information Security Terms | NIST , 2013 .

[7]  E. Hellinger,et al.  Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. , 1909 .

[8]  José Lobo,et al.  Identifying the sources of technological novelty in the process of invention , 2015 .

[9]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[10]  Qiang Lu,et al.  USPTO Patent Prosecution Research Data: Unlocking Office Action Traits , 2017 .

[11]  Timothy Baldwin,et al.  Evaluating topic models for digital libraries , 2010, JCDL '10.

[12]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[13]  Ronald J. Mann,et al.  A New Look at Patent Quality: Relating Patent Prosecution to Validity , 2012 .

[14]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[15]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Kurt Hornik,et al.  topicmodels : An R Package for Fitting Topic Models , 2016 .

[18]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[19]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[20]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[21]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[22]  Vasile Rus,et al.  Similarity Measures Based on Latent Dirichlet Allocation , 2013, CICLing.

[23]  Suzanne L. Holcombe United States Patent and Trademark Office , 2008 .

[24]  Padhraic Smyth,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2006, NIPS.

[25]  Deepak Hegde,et al.  The Bright Side of Patents , 2016 .

[26]  Ronald J. Mann,et al.  A New Look at Patent Quality: Relating Patent Prosecution to Validity , 2010 .

[27]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[28]  G. Tellis,et al.  Mining Marketing Meaning from Online Chatter: Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation , 2014 .

[29]  Josef van Genabith,et al.  A Minimally Supervised Approach for Synonym Extraction with Word Embeddings , 2016, Prague Bull. Math. Linguistics.

[30]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[31]  Ingrid Zukerman,et al.  Authorship Attribution with Latent Dirichlet Allocation , 2011, CoNLL.

[32]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[33]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[34]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[35]  Mark A. Lemley,et al.  Rethinking Patent Law's Presumption of Validity , 2007 .

[36]  Ton Steerneman,et al.  ON THE TOTAL VARIATION AND HELLINGER DISTANCE BETWEEN SIGNED MEASURES - AN APPLICATION TO PRODUCT MEASURES , 1983 .

[37]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[38]  P. V. Rao,et al.  Applied Survival Analysis: Regression Modeling of Time to Event Data , 2000 .

[39]  D.,et al.  Regression Models and Life-Tables , 2022 .

[40]  David C. Mowery,et al.  Post-Issue Patent "Quality Control": A Comparative Study of Us Patent Re-Examinations and European Patent Oppositions , 2002 .

[41]  Michael Greenacre,et al.  A Comparison of Different Methods for Representing Categorical Data , 2006 .

[42]  Stuart J.H. Graham,et al.  The USPTO Patent Examination Research Dataset: A Window on Patent Processing , 2018, Journal of Economics & Management Strategy.

[43]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.