Improving Patent Mining and Relevance Classification using Transformers

Patent analysis and mining are time-consuming and costly processes for companies, but nevertheless essential if they are willing to remain competitive. To face the overload induced by numerous patents, the idea is to automatically filter them, bringing only few to read to experts. This paper reports a successful application of fine-tuning and retraining on pre-trained deep Natural Language Processing models on patent classification. The solution that we propose combines several stateof-the-art treatments to achieve our goal : decrease the workload while preserving recall and precision metrics.

[1]  Jafar Afshar,et al.  Multi-label Patent Classification using Attention-Aware Deep Learning Model , 2020, 2020 IEEE International Conference on Big Data and Smart Computing (BigComp).

[2]  Binbin Xu,et al.  Pre-Training A Neural Language Model Improves the Sample Efficiency of an Emergency Room Classification Model , 2020, FLAIRS Conference.

[3]  Jianjun Hu,et al.  Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification , 2018, Entropy.

[4]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[5]  Alan C. Marco,et al.  Patent Claims and Patent Scope , 2016 .

[6]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[7]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[8]  Jianfeng Gao,et al.  DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Sungchul Choi,et al.  A Deep Patent Landscaping Model using Transformer and Graph Convolutional Network , 2019, ArXiv.

[11]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[12]  Yuxin Cui,et al.  DeepPatent: patent classification with convolutional neural networks and word embedding , 2018, Scientometrics.

[13]  Jieh Hsiang,et al.  Patent classification by fine-tuning BERT language model , 2020, World Patent Information.

[14]  Tao Li,et al.  Patent Mining: A Survey , 2015, SKDD.