A model-driven classification and recursive segmentation method for automatic panel extraction from biological and medical papers

We present a novel method to automatically extract panels from figures in biomedical articles. Our method consists of figure (or panel) classification and panel segmentation. Figure classification determines the existence of photograph in a figure. A Gaussian model is constructed for photographs and plots. Figures and panels are evaluated based on the model to determine their class. If it contains photographs, an iterative panel-splitting process follows. This process continues until no further straight lines are identified in the subfigures. Experiments were conducted with 182 figures from 25 articles published in different journals. Despite vast difference between figures, our method successfully extracted both plots and photographs and was able to identify zoom-in views that are superimposed on the original photographs.

[1]  Sargur N. Srihari,et al.  Knowledge-based derivation of document logical structure , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[2]  Azriel Rosenfeld,et al.  Document structure analysis algorithms: a literature survey , 2003, IS&T/SPIE Electronic Imaging.

[3]  Hagit Shatkay,et al.  Integrating image data into biomedical text categorization , 2006, ISMB.

[4]  David S. Doermann,et al.  A parallel-line detection algorithm based on HMM decoding , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jie Yao,et al.  Searching online journals for fluorescence microscope images depicting protein subcellular location patterns , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[6]  Sha Xie,et al.  Sperm ultrastructure in two species of Panorpa and one Bittacus (Mecoptera). , 2010, Micron.

[7]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[8]  C. Blackstone,et al.  Atlastin GTPases are required for Golgi apparatus and ER morphogenesis. , 2008, Human molecular genetics.

[9]  Bruno Antonny,et al.  Asymmetric Tethering of Flat and Curved Lipid Membranes by a Golgin , 2008, Science.

[10]  James Ze Wang,et al.  Automated analysis of images in documents for intelligent document search , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[11]  Edward Lank,et al.  Treatment of Diagrams in Document Image Analysis , 2000, Diagrams.

[12]  Yuntao Qian,et al.  Improved recognition of figures containing fluorescence microscope images in online journal articles using graphical models , 2008, Bioinform..

[13]  D. Latchman,et al.  Estrogen and non‐genomic upregulation of voltage‐gated Na+ channel activity in MDA‐MB‐231 human breast cancer cells: Role in adhesion , 2010, Journal of cellular physiology.