Accelerating In-memory Cross Match of Astronomical Catalogs

New astronomy projects generate observation images continuously and these images are converted into tabular catalogs online. Furthermore, each such new table, called a sample table, is compared against a reference table on the same patch of sky to annotate the stars that match those in the reference and to identify transient objects that have no matches. This cross match must be done within a few seconds to enable timely issuance of alerts as well as shipping of the data products off the pipeline. To perform the online cross match of tables on celestial objects, we propose two parallel algorithms, zone Match and grid Match, both of which divide up celestial objects by their locations in the spherical coordinate system. Specifically, zone Match divides the observation area by the declination coordinate of the celestial sphere whereas grid Match utilizes a two-dimensional grid on the declination and the right ascension. With the reference table indexed by zones or grid, we match the stars in the sample table through parallel index probes on the reference. We implemented these algorithms on a multicore CPU as well as a desktop GPU, and evaluated their performance on both synthetic data and real world astronomical data. Our results show that grid Match is faster than zone Match at the cost of memory space and that parallelization achieves speedups of orders of magnitude.