Using genre-specific features for patent summaries

Targeted summarization technique for patent material.Segment as intra-sentence summarization unit.Exploitation of lexical chains across the whole patent document.Full-fledged text generation techniques for summarization. Patent search is recall-driven, which goes hand in hand with at least a partial sacrifice of precision. As a consequence, patent analysts have to regularly view and examine a large amount of patents. This implies a very high workload. Interactive analysis aids that help to minimize this workload are thus of high demand. Still, these aids do not reduce the amount of the material to be examined, they only facilitate its examination. Its reduction can be achieved working with patent summaries instead of full patent documents. So far, high quality patent summaries are produced mainly manually and only a few research works address the problem of automatic patent summarization. Most often, these works either replicate the summarization metrics known from general discourse summarization or focus on the claims of a patent. However, it can be observed that neither of the strategies is adequate: general discourse state-of-the-art summarization techniques are of limited use due to the idiosyncrasies of the patent genre, and techniques that focus on claims only miss in their summaries important details provided in the other sections on the components of the invention introduced in the claims. We propose a patent summarization technique that takes the idiosyncrasies of the patent genre (such as the unbalanced distribution of the content across the different sections of a patent, excessive length of the sentences in the claims, abstract vocabulary, etc.) into account to obtain a comprehensive summary of the invention. In particular, we make use of lexical chains in the claims and in the description of the invention and of aligned claimdescription segments at the subsentential level to assess the relevance of the individual fragments of the document for the summary. The most relevant fragments are selected and merged using full-fledged natural language generation techniques.

[1]  Hinrich Schütze,et al.  Unsupervised Training Set Generation for Automatic Acquisition of Technical Terminology in Patents , 2014, COLING.

[2]  Didier Bourigault,et al.  Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases , 1992, COLING.

[3]  Amy J. C. Trappey,et al.  An R&D knowledge management method for patent document summarization , 2008, Ind. Manag. Data Syst..

[4]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[5]  Horacio Saggion,et al.  Multi-document summarization by cluster/prole relevance and redundancy removal , 2004 .

[6]  Dragomir R. Radev,et al.  Identifying Non-Explicit Citing Sentences for Citation-Based Summarization. , 2010, ACL.

[7]  Dragomir R. Radev,et al.  Coherent Citation-Based Summarization of Scientific Papers , 2011, ACL.

[8]  Heeyoung Lee,et al.  A Multi-Pass Sieve for Coreference Resolution , 2010, EMNLP.

[9]  Yohei Seki,et al.  Sentence Extraction by tf/idf and Position Weighting from Newspaper Articles , 2002, NTCIR.

[10]  Dragomir R. Radev,et al.  Blind men and elephants: What do citation summaries tell us about a research article? , 2008 .

[11]  R. Subhashini,et al.  Shallow NLP techniques for noun phrase extraction , 2010, Trendz in Information Sciences & Computing(TISC2010).

[12]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[13]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[14]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[15]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[16]  Ian Witten,et al.  Data Mining , 2000 .

[17]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.

[18]  H. Grice Logic and conversation , 1975 .

[19]  Thierry Poibeau,et al.  Automatic Text Summarization: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[20]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[22]  Yuen-Hsien Tseng,et al.  Text mining techniques for patent analysis , 2007, Inf. Process. Manag..

[23]  Leo Wanner,et al.  Multilingual summarization in practice: the case of patent claims , 2008, EAMT.

[24]  Samee U. Khan,et al.  A literature review on the state-of-the-art in patent analysis , 2014 .

[25]  Michael Halliday,et al.  Cohesion in English , 1976 .

[26]  Peter ErdiKinga Prediction of emerging technologies based on analysis of the US patent citation network , 2013 .

[27]  Yiannis Kompatsiaris,et al.  Towards content-oriented patent document processing , 2008 .

[28]  Trademark Office,et al.  Manual of patent examining procedure , 2004 .

[29]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[30]  Kwangsoo Kim,et al.  Identifying patent infringement using SAO based semantic technological similarities , 2011, Scientometrics.

[31]  Jae Yeol Lee,et al.  An SAO‐Based Text‐Mining Approach for Technology Roadmapping Using Patent Information , 2013 .

[32]  Gabriela Ferraro,et al.  Improving the comprehension of legal documentation: the case of patent claims , 2009, ICAIL.

[33]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[34]  Naomie Salim,et al.  A review on abstractive summarization methods , 2014 .

[35]  Masaaki Nagata,et al.  Single-Document Summarization as a Tree Knapsack Problem , 2013, EMNLP.

[36]  Robert J. Gaizauskas,et al.  Using Coreference Chains for Text Summarization , 1999, COREF@ACL.

[37]  Fulvio Corno,et al.  Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics , 2010 .

[38]  Horacio Saggion A Robust and Adaptable Summarization Tool , 2008 .

[39]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.

[40]  Ted Briscoe,et al.  The Syntactic Regularity of English Noun Phrases , 1989, EACL.

[41]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[42]  Makoto Iwayama,et al.  Patent Claim Processing for Readability - Structure Analysis and Term Explanation , 2003, ACL 2003.

[43]  Horacio Saggion,et al.  SUMMA. A Robust and Adaptable Summarization Tool , 2008, TAL.

[44]  Ani Nenkova,et al.  The Impact of Frequency on Summarization , 2005 .

[45]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[46]  Leo Wanner,et al.  Open Soucre Graph Transducer Interpreter and Grammar Development Environment , 2010, LREC.

[47]  Bo Gao,et al.  PatentMiner: topic-driven patent analysis and mining , 2012, KDD.

[48]  Amy J. C. Trappey,et al.  Automatic patent document summarization for collaborative knowledge systems and services , 2009 .

[49]  Péter Érdi,et al.  Prediction of emerging technologies based on analysis of the US patent citation network , 2012, Scientometrics.

[50]  Yongtae Park,et al.  How to assess patent infringement risks: a semantic patent claim analysis using dependency relationships , 2013, Technol. Anal. Strateg. Manag..

[51]  Joan Codina,et al.  An Exercise in Reuse of Resources: Adapting General Discourse Coreference Resolution for Detecting Lexical Chains in Patent Documentation , 2014, LREC.

[52]  Claire Cardie,et al.  Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification , 1998, ACL.

[53]  Horacio Saggion Creating Summarization Systems with SUMMA , 2014, LREC.

[54]  Horacio Saggion,et al.  Generating Indicative-Informative Summaries with SumUM , 2002, Computational Linguistics.

[55]  Bart Baesens,et al.  Assessment of Latent Semantic Analysis (LSA) Text Mining Algorithms for Large Scale Mapping of Patent and Scientific Publication Documents , 2011 .