Automatic patent document summarization for collaborative knowledge systems and services

Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development time, increases market success, and reduces potential patent infringement. Thus, it is beneficial to automatically and systematically extract information from patent documents in order to improve knowledge sharing and collaboration among R&D team members. In this research, patents are summarized using a combined ontology based and TF-IDF concept clustering approach. The ontology captures the general knowledge and core meaning of patents in a given domain. Then, the proposed methodology extracts, clusters, and integrates the content of a patent to derive a summary and a cluster tree diagram of key terms. Patents from the International Patent Classification (IPC) codes B25C, B25D, B25F (categories for power hand tools) and B24B, C09G and H011 (categories for chemical mechanical polishing) are used as case studies to evaluate the compression ratio, retention ratio, and classification accuracy of the summarization results. The evaluation uses statistics to represent the summary generation and its compression ratio, the ontology based keyword extraction retention ratio, and the summary classification accuracy. The results show that the ontology based approach yields about the same compression ratio as previous non-ontology based research but yields on average an 11% improvement for the retention ratio and a 14% improvement for classification accuracy.

[1]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[2]  Chao-Fu Hong,et al.  Extracting the significant-rare keywords for patent analysis , 2009, Expert Syst. Appl..

[3]  Hui Xiong,et al.  A Generalization of Proximity Functions for K-Means , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[4]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[5]  Giovanni Guida,et al.  Evaluating Importance: A Step Towards Text Summarization , 1985, IJCAI.

[6]  Robert F. Lorch,et al.  Effects of Headings on Text Summarization. , 2001, Contemporary educational psychology.

[7]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[8]  Fu-Ren Lin,et al.  Storyline-based summarization for news topic retrospection , 2008, Decis. Support Syst..

[9]  Tat-Seng Chua,et al.  Document concept lattice for text understanding and summarization , 2007, Inf. Process. Manag..

[10]  Ilyas Cicekli,et al.  Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..

[11]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[12]  Sheryl R. Young,et al.  Automatic Classification and Summarization of Banking Telexes , 1985, CAIA.

[13]  Amy J. C. Trappey,et al.  Development of a patent document classification and search platform using a back-propagation network , 2006, Expert Syst. Appl..

[14]  Paul Nation,et al.  Identifying technical vocabulary , 2004 .

[15]  Qinghua Zheng,et al.  Automatic extraction of titles from general documents using machine learning , 2006, Inf. Process. Manag..

[16]  Soichiro Asano Patent information retrieval system "B" of The Japan patent Information Center , 1974 .

[17]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[18]  Fernando Bobillo,et al.  Representation of context-dependant knowledge in ontologies: A model and an application , 2008, Expert Syst. Appl..

[19]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[20]  Warren R. Greiff,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR '98.

[21]  Wei-Pang Yang,et al.  Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis , 2002, ICADL.

[22]  Xijin Tang,et al.  Text classification based on multi-word with support vector machine , 2008, Knowl. Based Syst..

[23]  Hyoil Han,et al.  The use of domain-specific concepts in biomedical text summarization , 2007, Inf. Process. Manag..

[24]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[25]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[26]  Wei-Pang Yang,et al.  iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network , 2008, Expert Syst. Appl..

[27]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.

[28]  Mary Ellen Okurowski,et al.  A Scalable Summarization System Using Robust NLP , 1997 .

[29]  C L Roper The stapling device. , 1980, The Annals of thoracic surgery.

[30]  Inderjeet Mani,et al.  The Tipster Summac Text Summarization Evaluation , 1999, EACL.

[31]  Feng-Yuan Chuang,et al.  OntoZilla: an ontology-based, semi-structured, and evolutionary peer-to-peer network for information systems and services , 2007 .

[32]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[33]  Subhash Sharma Applied multivariate techniques , 1995 .

[34]  Giorgio Pedrazzi,et al.  Text mining applied to patent mapping: a practical business case , 2003 .

[35]  Amy J. C. Trappey,et al.  An R&D knowledge management method for patent document summarization , 2008, Ind. Manag. Data Syst..

[36]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[37]  Yuen-Hsien Tseng,et al.  Text mining techniques for patent analysis , 2007, Inf. Process. Manag..

[38]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[39]  Pedro Rosa,et al.  Moving from syntactic to semantic organizations using JXML2OWL , 2008, Comput. Ind..

[40]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[41]  Antoine Blanchard,et al.  Understanding and customizing stopword lists for enhanced patent mapping , 2007 .

[42]  Hong-Gee Kim,et al.  An ontology-based approach to learnable focused crawling , 2008, Inf. Sci..

[43]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[44]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[45]  Martin Hassel Summaries and the Process of Summarization from Evaluation of Automatic Text Summarization -a Practical Implementation , 2004 .

[46]  Paul Buitelaar,et al.  Ontology-based information extraction and integration from heterogeneous data sources , 2008, Int. J. Hum. Comput. Stud..

[47]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[48]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[49]  Youngjoong Ko,et al.  Topic Keyword Identification for Text Summarization Using Lexical Clustering , 2003 .

[50]  Amy J. C. Trappey,et al.  Technology and knowledge document cluster analysis for enterprise R&D strategic planning , 2006, Int. J. Technol. Manag..

[51]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[52]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..