Visual graph analysis for quality assessment of manually labelled documents image database

The context of this paper is the labelling of a document image database in an industrial process. Our work focuses on the quality assessment of a given labelled database. In most practical cases, a database is manually labelled by an operator who has to browse sequentially the images (presented as thumbnails) until the whole database is labelled. This task is very repetitive; moreover the filing plan defining the names and number of classes is often incomplete, which leads to many labelling errors. The question is then to certify if the quality of a labelled batch is good enough to globally accept it. Our objective is to ease and speed up that evaluation that needs up to 1.5 more times than the labelling work itself. We propose an interactive tool for visualizing the data as a graph. That graph enhances similarities between documents as well as the labelling quality. We define criteria on the graph that characterize the three types of errors an operator can do: an image is mislabelled, one class should be split in more pertinent subclasses, several classes should be merged in another. This allows us to focus the operator attention on potential errors. He can then count the errors encountered while auditing the database and assess (or not) the global labelling quality.

[1]  Marcel Worring,et al.  A multimedia analytics framework for browsing image collections in digital forensics , 2012, ACM Multimedia.

[2]  Jean-Philippe Domenger,et al.  Document Images Indexing with Relevance Feedback: An Application to Industrial Context , 2011, 2011 International Conference on Document Analysis and Recognition.

[3]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[4]  Michael Jünger,et al.  Drawing Large Graphs with a Potential-Field-Based Multilevel Algorithm , 2004, GD.

[5]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[6]  Gerald Schaefer,et al.  Visualisation and Browsing of Image Databases , 2011, Multimedia Analysis, Processing and Communications.

[7]  Danilo Medeiros Eler,et al.  Visual analysis of image collections , 2009, The Visual Computer.

[8]  Peter J. Stuckey,et al.  Fast Node Overlap Removal , 2005, GD.

[9]  Guy Melançon,et al.  Tulip : a scalable graph visualization framework , 2010, EGC.

[10]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[11]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[12]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .