Supporting Patent Mining by using Ontology-based Semantic Annotations

Semantic web approach seems interesting for supporting content mining of millions of patents accessible through the Web. In this paper, we describe our approach for generating semantic annotations on patents, by relying on the structure and on a semantic representation of patent documents. We use both the structure of the patent documents and their textual contents processed by Natural Language Processing (NLP) tools. This method, primarily aimed at helping biologists use patent information can be generalized to all kinds of domains or of structured documents.

[1]  Catherine Faron-Zucker,et al.  Querying the Semantic Web with Corese Search Engine , 2004, ECAI.

[2]  R. Lyman Ott.,et al.  An introduction to statistical methods and data analysis , 1977 .

[3]  Brian Kelly,et al.  Webwatching UK Web Communities: Final Report For The WebWatch Project , 1999 .

[4]  David Eichmann,et al.  2 – Background : Agents in General and Spiders in Particular , 1994 .

[5]  Sougata Mukherjea,et al.  Information retrieval and knowledge discovery utilizing a biomedical patent semantic Web , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[7]  B L Humphreys,et al.  The UMLS project: making the conceptual connection between users and the information they need. , 1993, Bulletin of the Medical Library Association.

[8]  C. Lee Giles,et al.  A large-scale study of robots.txt , 2007, WWW '07.

[9]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[10]  Tham Yoke Chun World wide web robots: an overview , 1999, Online Inf. Rev..

[11]  Lu Xiao,et al.  Automatic mapping from XML documents to ontologies , 2004, The Fourth International Conference onComputer and Information Technology, 2004. CIT '04..

[12]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[13]  Thomas Ertl,et al.  Application of Semantic Technologies for Representing Patent Metadata , 2006, GI Jahrestagung.

[14]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[15]  M. Carl Drott Indexing aids at corporate websites: the use of robots.txt and META tags , 2002, Inf. Process. Manag..

[16]  Rose Dieng,et al.  Semantic Web Technologies for Interpreting DNA Microarray Analyses: The MEAT System , 2005, WISE.

[17]  A. Oskamp,et al.  Agent Exclusion on Websites , 2005 .

[18]  Catriel Beeri,et al.  Mapping XML Fragments to Community Web Ontologies , 2001, WebDB.

[19]  Robert Stevens,et al.  Sealife: A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases , 2006, HealthGrid.