Automatic patent classification using citation network information: an experimental study in nanotechnology

Classifying and organizing documents in repositories is an active research topic in digital library studies. Manually classifying the large volume of patents and patent applications managed by patent offices is a labor-intensive task. Many previous studies have employed patent contents for patent classification with the aim of automating this process. In this research we propose to use patent citation information, especially the citation network structure information, to address the patent classification problem. We adopt a kernel-based approach and design kernel functions to capture content information and various citation-related information in patents. These kernels. performances are evaluated on a testbed of patents related to nanotechnology. Evaluation results show that our proposed labeled citation graph kernel, which utilized citation network structures, significantly outperforms the kernels that use no citation information or only direct citation information.

[1]  A. Törcsvári,et al.  Automated categorization in the international patent classification , 2003, SIGF.

[2]  C. Koster,et al.  Classifying Patent Applications with Winnow , 2001 .

[3]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[4]  Zan Huang,et al.  Longitudinal Nanotechnology Development (1991--2002): National Science Foundation Funding and its Impact on Patents , 2005 .

[5]  Thorsten Teichert,et al.  Text mining for technology monitoring , 2002, IEEE International Engineering Management Conference.

[6]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[7]  Zan Huang,et al.  International nanotechnology development in 2003: Country, institution, and technology field analysis based on USPTO patent database , 2004 .

[8]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[9]  Zan Huang,et al.  Longitudinal Patent Analysis for Nanoscale Science and Engineering: Country, Institution and Technology Field , 2003 .

[10]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[11]  Eric Gaussier,et al.  Language technologies and patent search and classification , 2001 .

[12]  Sung-Hyon Myaeng,et al.  A practical hypertext catergorization method using links and incrementally available class information , 2000, SIGIR '00.

[13]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[14]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[15]  Han Tong Loh,et al.  Automatic classification of patent documents for TRIZ users , 2006 .

[16]  Harold Smith Automation of patent classification , 2002 .

[17]  Leah S. Larkey,et al.  A patent search and classification system , 1999, DL '99.

[18]  A. Törcsvári,et al.  Automated categorization of German-language patent documents , 2004, Expert Syst. Appl..

[19]  Andrew MacFarlane,et al.  The impact of metadata on the accuracy of automated patent classification , 2005 .

[20]  Jun Wang,et al.  A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension , 2004, IEEE Transactions on Knowledge and Data Engineering.

[21]  Nivio Ziviani,et al.  Link-based similarity measures for the classification of Web documents , 2006, J. Assoc. Inf. Sci. Technol..

[22]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[23]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[24]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[25]  Marc Krier,et al.  Automatic categorisation applications at the European patent office , 2002 .

[26]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[27]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.