Towards an Efficient Way of Building Annotated Medical Image Collections for Big Data Studies

Annotating large collections of medical images is essential for building robust image analysis pipelines for different applications, such as disease detection. This process involves expert input, which is costly and time consuming. Semiautomatic labeling and expert sourcing can speed up the process of building such collections. In this work we report innovations in both of these areas. Firstly, we have developed an algorithm inspired by active learning and self training that significantly reduces the number of annotated training images needed to achieve a given level of accuracy on a classifier. This is an iterative process of labeling, training a classifier, and testing that requires a small set of labeled images at the start, complemented with human labeling of difficult test cases at each iteration. Secondly, we have built a platform for large scale management and indexing of data and users, as well as for creating and assigning tasks such as labeling and contouring for big data medical imaging studies. This is a web-based platform and provides the tooling for both researchers and annotators, all within a simple dynamic user interface. Our annotation platform also streamlines the process of iteratively training and labeling in algorithms such as active learning/self training described here. In this paper, we demonstrate that the combination of the platform and the proposed algorithm significantly reduces the workload involved in building a large collection of labeled cardiac echo images.