A Method of Collecting Four Character Medicine Effect Phrases in TCM Patents Based on Semi-supervised Learning.

As a result of historical reasons and writing habits, the effects of medicine in Traditional Chinese Medicine (TCM) patents are often described using four character phrases. These four character phrases are not easily identified by the Chinese word segmentation system, thus greatly affects the results of patent analysis and mining. This paper proposes a semi-supervised learning method to collect four character effect phrases from the abstracts texts of TCM patents, which can help enrich the lexicon of Chinese word segmentation system, and also provide support for semantic patent retrieval and analysis. The experimental results show the validity of the method.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Wang-Chien Lee,et al.  Patent Citation Recommendation for Examiners , 2015, 2015 IEEE International Conference on Data Mining.

[3]  Vivek K. Singh,et al.  Automated patents search through semantic similarity , 2015, 2015 International Conference on Computer, Communication and Control (IC4).

[4]  An-Lei Hu,et al.  Co-training based semi-supervised Web spam detection , 2013, 2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[5]  Tao Li,et al.  Patent Mining: A Survey , 2015, SKDD.

[6]  Paulo Eduardo Maciel de Almeida,et al.  Automatic Patent Clustering using SOM and Bibliographic Coupling , 2017 .

[7]  Yan Li,et al.  Product functional information based automatic patent classification: Method and experimental studies , 2017, Inf. Syst..

[8]  Jadi Suprijadi,et al.  Text grouping in patent analysis using adaptive K-means clustering algorithm , 2017 .

[9]  Lanfen Lin,et al.  An Ontology-Based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design , 2013 .

[10]  Xu Chen,et al.  PaEffExtr: A Method to Extract Effect Statements Automatically from Patents , 2017, CISIS.

[11]  Zeyar Aung,et al.  Automatic patent classification by a three-phase model with document frequency matrix and boosted tree , 2016, 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA).

[12]  Xu Chen,et al.  A Semi-Supervised Machine Learning Method for Chinese Patent Effect Annotation , 2015, 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[13]  Noriko Kando,et al.  Patent-Related Tasks at NTCIR , 2017 .

[14]  Christopher L. Magee,et al.  Estimating technology performance improvement rates by mining patent data , 2020 .

[15]  Lanfen Lin,et al.  Domain lexicon-based query expansion for patent retrieval , 2016, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[16]  Desheng Li,et al.  Intelligent Recommendation of Chinese Traditional Medicine Patents Supporting New Medicine’s R&D , 2016 .

[17]  Fuming Sun,et al.  Multi-label learning with co-training based on semi-supervised regression , 2014, Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC).

[18]  Ryohei Orihara,et al.  Applying Information Extraction for Patent Structure Analysis , 2017, SIGIR.

[19]  Eirini Ntoutsi,et al.  Large Scale Sentiment Learning with Limited Labels , 2017, KDD.

[20]  Deng Na,et al.  Automatically generation and evaluation of Stop words list for Chinese Patents , 2015 .

[21]  Xu Chen,et al.  The Construction Method of Clue Words Thesaurus in Chinese Patents Based on Iteration and Self-filtering , 2017, EIDWT.