Retrieval of Patent Documents from Heterogeneous Sources Using Ontologies and Similarity Analysis

In the past few years, there has been an explosive growth in scientific and legal information related to the patent system. Patents and related documents are siloed into multiple heterogeneous sources. Retrieving relevant information from diverse sources is a non-trivial task and poses many technical challenges. Among the challenges is the issue of terminological inconsistencies that are used in the documents. We tackle the terminological inconsistency issue by exploring domain knowledge through the use of ontology standards. Furthermore, we take advantage of cross-references and structural dependencies between the information sources to enhance terminological comparison. In this paper, we present a similarity analysis methodology which combines knowledge from two distinct sources -- (1) domain ontologies and (2) ontologies which describe the information sources to assist a user in identifying relevant documents across several information sources simultaneously. Specifically, we explore the use of a rule-based system to infer relationships between documents based on pre-defined heuristics. We present our results through a use case in the bio-patent domain with a collection of 1150 patents and 30 court cases.

[1]  Asunción Gómez-Pérez,et al.  Ontological Engineering: With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web , 2004, Advanced Information and Knowledge Processing.

[2]  Mark A. Musen,et al.  A System for Ontology-Based Annotation of Biomedical Data , 2008, DILS.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  John J. McCarthy,et al.  The Rule Engine for the Java Platform , 2008 .

[5]  Amit P. Sheth,et al.  Changing Focus on Interoperability in Information Systems:From System, Syntax, Structure to Semantics , 1999 .

[6]  Kincho H. Law,et al.  Developing a Comprehensive Patent Related Information Retrieval Tool , 2011, J. Theor. Appl. Electron. Commer. Res..

[7]  Jungi Kim,et al.  Cluster-based patent retrieval , 2007, Inf. Process. Manag..

[8]  Boris Motik,et al.  Query Answering for OWL-DL with Rules , 2004, SEMWEB.

[9]  Yiannis Kompatsiaris,et al.  Towards content-oriented patent document processing , 2008 .

[10]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[11]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[12]  Bijan Parsia,et al.  Cautiously Approaching SWRL , 2005 .

[13]  Atsushi Fujii Enhancing patent retrieval by citation analysis , 2007, SIGIR.

[14]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[15]  Stefan Decker,et al.  A Scalable Framework for the Interoperation of Information Sources , 2001, SWWS.

[16]  Kincho H. Law,et al.  Developing an ontology for the U.S. patent system , 2011, dg.o '11.

[17]  Nicola Guarino,et al.  Formal Ontology and Information Systems , 1998 .

[18]  H. Lan,et al.  SWRL : A semantic Web rule language combining OWL and ruleML , 2004 .

[19]  Sougata Mukherjea,et al.  BioPatentMiner: An Information Retrieval System for BioMedical Patents , 2004, VLDB.

[20]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[21]  Jos de Bruijn,et al.  D4.2.1 State-of-the-art survey on Ontology Merging and Aligning V1 , 2004 .

[22]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[23]  Kincho H. Law,et al.  A relatedness analysis of government regulations using domain knowledge and structural organization , 2006, Information Retrieval.

[24]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[25]  Bruce G. Buchanan,et al.  The MYCIN Experiments of the Stanford Heuristic Programming Project , 1985 .

[26]  Steven R. Ray,et al.  Interoperability Standards in the Semantic Web , 2002, J. Comput. Inf. Sci. Eng..

[27]  Heiner Stuckenschmidt,et al.  Ontology-Based Integration of Information - A Survey of Existing Approaches , 2001, OIS@IJCAI.