Exploiting Images for Patent Search

Patent offices worldwide receive considerable numbers of patent documents that aim at describing and protecting innovative artifacts, processes, algorithms, and other inventions. These documents apart from the main text description may contain figures, drawings, and diagrams in an effort to better explain the patented object. Two main directions are presented in this chapter; concept-based and content-based patent retrieval. Concept-based search utilizes textual and visual information, fusing them in a classification late fusion stage. Conversely, content-based retrieval is based on the shape/content information from patent images and is therefore based on the visual descriptors that are extracted from binary images. Concepts are extracted using classification techniques, such as support vector machines and random forests. Adaptive hierarchical density histograms serve as binary image retrieval techniques that combine high efficiency and effectiveness, while being compact and therefore capable of dealing with large binary image databases. Given the vast number of images included in patent documents, it is highly significant for the patent experts to be able to examine them in their attempt to understand the patent contents and identify relevant inventions. Therefore, patent experts would benefit greatly from a tool that supports efficient patent image retrieval and extends standard figure browsing and metadata-based retrieval by providing content-based search according to the query-by-example paradigm.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[3]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yiannis Kompatsiaris,et al.  Concept-oriented labelling of patent images based on Random Forests and proximity-driven generation of synthetic data , 2014, VL@COLING.

[5]  Cardona Alzate,et al.  Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas , 2020 .

[6]  Naif Alajlan,et al.  Shape retrieval using triangle-area representation and dynamic space warping , 2007, Pattern Recognit..

[7]  Andreas Rauber,et al.  Learning keyword phrases from query logs of USPTO patent examiners for automatic query scope limitation in patent searching , 2015 .

[8]  Ioannis Kompatsiaris,et al.  Concept-based patent image retrieval , 2012 .

[9]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[10]  Hongfei Lin,et al.  Patent Retrieval Based on Multiple Information Resources , 2016, AIRS.

[11]  Urszula Markowska-Kaczmar,et al.  Caption-guided patent image segmentation , 2016, 2016 Federated Conference on Computer Science and Information Systems (FedCSIS).

[12]  Allan Hanbury,et al.  Evaluating Information Retrieval Systems on European Patent Data: The CLEF-IP Campaign , 2017 .

[13]  Christian Bauckhage Tree-Based Signatures for Shape Classification , 2006, 2006 International Conference on Image Processing.

[14]  Yiannis Kompatsiaris,et al.  Enhancing Patent Search with Content-Based Image Retrieval , 2014, Professional Search in the Modern World.

[15]  Fuhui Long,et al.  Fundamentals of Content-Based Image Retrieval , 2003 .

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Oriol Ramos Terrades,et al.  Flowchart recognition for non-textual information retrieval in patent search , 2013, Information Retrieval.

[18]  Yi Li,et al.  Learning Visual Shape Lexicon for Document Image Content Recognition , 2008, ECCV.

[19]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[20]  Allan Hanbury,et al.  Classifying Patent Images , 2011, CLEF.

[21]  Yiannis Kompatsiaris,et al.  Content-based binary image retrieval using the adaptive hierarchical density histogram , 2011, Pattern Recognit..

[22]  Yiannis Kompatsiaris,et al.  Concept Extraction from Patent Images Based on Recursive Hybrid Classification , 2013, IRFC.

[23]  John K. Tsotsos,et al.  Bounding box splitting for robust shape classification , 2005, IEEE International Conference on Image Processing 2005.

[24]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[25]  Jiwu Huang,et al.  Near-Duplicate Image Recognition and Content-based Image Retrieval using Adaptive Hierarchical Geometric Centroids , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[26]  Jamshid Shanbehzadeh,et al.  Image retrieval based on shape similarity by edge orientation autocorrelogram , 2003, Pattern Recognit..

[27]  Symeon Papadopoulos,et al.  Towards content-based patent image retrieval: A framework perspective , 2010 .

[28]  L. Rodney Long,et al.  Content-Based Image Retrieval for Large Biomedical Image Archives , 2004, MedInfo.

[29]  John P. Eakins,et al.  Shape Feature Matching for Trademark Image Retrieval , 2003, CIVR.

[30]  Yiannis Kompatsiaris,et al.  PatMedia: Augmenting Patent Search with Content-Based Image Retrieval , 2012, IRFC.

[31]  Anton Satria Prabuwono,et al.  Retrieval System for Patent Images , 2013 .

[32]  Veena Bansal,et al.  PATSEEK: Content Based Image Retrieval System for Patent Database , 2004, ICEB.