Discovering Semantic Relationships Among PDF Book Figures Using Contextual and Visual Similarity

Figures in books enhance the knowledge of users about facts, statistics, objects and concepts. Searching of figures and their associated textual information in the context of a specific topic from a collection of books needs plenty of time. Currently the solutions available in literature for figure searching are limited to scientific documents and are inapplicable to books due to their large sizes. These solutions are also domain dependent, uni-model, context ignorant and limited to their local repository due to which they give a large number of irrelevant search results and the figures are retrieved with limited information. Therefore, a generic, bi-model, dual repository based figure retrieval system is proposed to retrieve figures from books along with the contextual information to improve the understandability of the user. The system can establish semantic relationships between PDF book figures and can also relate figures to images on the web. The system can retrieve the figures with 91-96.5% precision and makes the navigation and searching of figures an effective and pleasant experience.

[1]  Cornelia Caragea,et al.  PDFMEF: A Multi-Entity Knowledge Extraction Framework for Scholarly Documents and Semantic Search , 2015, K-CAP.

[2]  MitraPrasenjit,et al.  Summarizing figures, tables, and algorithms in scientific publications to augment search results , 2012 .

[3]  Prasenjit Mitra,et al.  Summarizing figures, tables, and algorithms in scientific publications to augment search results , 2012, TOIS.

[4]  Hong Yu,et al.  Automatic Figure Ranking and User Interfacing for Intelligent Figure Search , 2010, PloS one.

[5]  C. Lee Giles,et al.  A hybrid approach to discover semantic hierarchical sections in scholarly documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[6]  Hong Yu,et al.  FigSum: Automatically Generating Structured Text Summaries for Figures in Biomedical Literature , 2009, AMIA.

[7]  Cheng Thao,et al.  GoldMiner: a radiology image search engine. , 2007, AJR. American journal of roentgenology.

[8]  Debarshi Kumar Sanyal,et al.  Figure Retrieval From Biomedical Literature: An Overview of Techniques, Tools, and Challenges , 2019, Machine Learning in Bio-Signal Analysis and Diagnostic Imaging.

[9]  Ali Farhadi,et al.  FigureSeer: Parsing Result-Figures in Research Papers , 2016, ECCV.

[10]  K. Sai Deepak,et al.  Figure Retrieval in Biomedical Literature , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[11]  C. Lee Giles,et al.  Figure Metadata Extraction from Digital Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[12]  C. Lee Giles,et al.  An Architecture for Information Extraction from Figures in Digital Libraries , 2015, WWW.

[13]  ChengXiang Zhai,et al.  Figure Retrieval from Collections of Research Articles , 2019, ECIR.

[14]  Michael Krauthammer,et al.  Yale Image Finder (YIF): a new search engine for retrieving biomedical images , 2008, Bioinform..

[15]  Lior Rokach,et al.  A figure search engine architecture for a chemistry digital library , 2013, JCDL '13.

[16]  Preslav Nakov,et al.  BioText Search Engine: beyond abstract search , 2007, Bioinform..

[17]  Eric P. Xing,et al.  Structured Literature Image Finder , 2009 .

[18]  Ansgar Scherp,et al.  A Comparison of Approaches for Automated Text Extraction from Scholarly Figures , 2017, MMM.

[19]  Payel Ghosh,et al.  Review of medical image retrieval systems and future directions , 2011, 2011 24th International Symposium on Computer-Based Medical Systems (CBMS).