Small Data: A Ubiquitous, Yet Untapped, Resource for Low-Cost Imaging Innovation

PET, conventional nuclear imaging, and most contemporary medical imaging modalities are inherently digital technologies. Over the last several decades, there has been a transformative evolution of the digital computing landscape with respect to speed, cost of storage, infrastructure, and available expertise. However, our use of data, and in fact our whole understanding of the role of data in relation to emission imaging, has remained relatively unchanged. If we take a moment to reflect on this resource, generated ubiquitously in our daily imaging procedures, we can recognize that we have the capacity to support information use beyond the present convention and that the raw data provided by nuclear imaging studies can be tapped to fuel innovation. Our general understanding of image data is that it exists in DICOM-format images, essentially analogous to film and representing a quantity of source signal distributed in space. However, the signals and information used to create these images in nuclear medicine originate in a much denser form; our imaging machines capture highly detailed time, location, and energy information for individual decay events. The current practice in PET, for example, is to truncate this information using assumptions and reconstruction techniques so as to provide a representation of tracer emissions distributed in recognizable Cartesian space. This process of biodistribution–representative image generation has essentially defined nuclear imaging for half a century. The procedure of truncating (unused) information is heavily ingrained in our practice likely because, for most of the field’s existence, it has been expensive and impractical to save raw acquisition data. The costs associated with saving data have never been a static consideration. In 1980, a gigabyte of data cost $600,000 (1) (approximate value, inflation-adjusted), in 1990 that cost went down to $15,000, in 2016 it went down to $0.02, and we can confidently project continuation of this financial trend. Retaining a 2-Gb raw PET acquisition file now represents approximately 0.001% of the market cost of a scan. Both the cost and the capacity of digital imaging have undergone a slow but, in aggregate, very large shift. Each year, data-driven solutions become more practical and more relevant than in the year before, as shown in Figure 1. In the 1990s, we passed a milestone when digital storage became more cost-effective than paper storage (2). It is possible that we have now passed a new barrier in that we can say the cost of saving raw imaging acquisition data is negligible relative to the cost of generating the data. Furthermore, with ionizing radiation imaging, the cost-of-data paradigm does not include only an economic cost. Because patients are being exposed to radiation to generate these data, and at a risk to their health, it is prudent for us to periodically reconsider whether our practices are making optimal use of it. One reason to support changing our data-saving practice toward more robust access and archiving comes from the fact that we already have a body of literature showing that access to raw data can enable creative innovation. As an example, our group has recently published a study showing that large populations of scans can be corrected for motion using advanced data-use techniques and without the need for gating equipment or modified acquisition procedures (3). Additional areas of respiratory, cardiac, and head motion correction; signal and dose optimization; open-source reconstruction; and retrospective reframing have also begun to be explored (4). Progress in these areas and the impact of data-based