A Robust Classification Framework for Medical Patents Based on Deep Learning

With the repaid development of bioinformatics and pharmaceutical engineering, pharmaceutical company and institutes increasingly pay attention to intellectual property protection via medical patents. As a result, how to classify the massive medical patents accurately without manual intervention is an important challenge for academia and industrials. To address it, we propose a deep learning based classification framework for medical patents, which consists of three components (i.e., text processing, feature extraction, and prototype clustering). Different from the existing classification method based on machine learning, the proposed framework enjoys the robust characteristic for the external samples, while it can guarantee high precision. In detail, for the text processors, a professional medical text thesaurus is built via the GloVe method, which can learn more specialized vocabulary in the medical field. In the feature extraction, a hybrid deep learning model is proposed to extract the features of patent texts, which integers a one-dimensional convolutional neural network (CNN) and two bidirectional long-short-term sequence network (Bi-LSTM), propose an improved distance-based center loss function (DCL). Finally, extensive experiments are conducted on the Chinese medical patents dataset supported by the company. It demonstrates that our proposed method shows the significant superiority in the classification precision and robustness, compared with other existing multi-classification methods.