Cross-Matching Large Astronomical Catalogs on Heterogeneous Clusters

Cross-matching astronomical catalogs is a central operation in astronomical data integration and analysis. As current commodity clusters typically consist of heterogeneous processors including both multi-core CPUs and GPUs, we study how to efficiently cross-match large astronomical catalogs on such clusters. Specifically, we develop a three-phase common algorithm for parallel cross-match, and optimize it for a single GPU, multiple GPUs on a node, and a heterogeneous cluster of multiple nodes, respectively. Furthermore, we study the performance impact of data chunk size and that of inter-node communication mechanisms in the cluster. Our results show that, with suitable design choices and optimizations, cross-matching billion-record catalogs was completed under 10 minutes on a seven-node CPU-GPU cluster.

[1]  Joel H. Saltz,et al.  Architectural implications for spatial object association algorithms , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2]  Aniruddha R. Thakar,et al.  Cross-Matching Very Large Datasets , .

[3]  Alexander S. Szalay,et al.  There Goes the Neighborhood: Relational Algebra for Spatial Data Search , 2004, ArXiv.

[4]  Tamás Budavári,et al.  Xmatch: GPU Enhanced Astronomic Catalog Cross-Matching , 2013 .

[5]  Alexander S. Szalay,et al.  Efficient Catalog Matching with Dropout Detection , 2013, 1403.4358.

[6]  K. Gorski,et al.  HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere , 2004, astro-ph/0409513.

[7]  Aniruddha R. Thakar,et al.  The Hierarchical Triangular Mesh , 2001 .

[8]  Jizhou Sun,et al.  A Paralleled Large-Scale Astronomical Cross-Matching Function , 2009, ICA3PP.

[9]  Alexander S. Szalay,et al.  The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatial Datasets , 2007, ArXiv.

[10]  Alexander S. Szalay,et al.  TO APPEAR IN THE ASTROPHYSICAL JOURNAL Preprint typeset using LATEX style emulateapj v. 10/09/06 PROBABILISTIC CROSS-IDENTIFICATION OF ASTRONOMICAL SOURCES , 2008 .

[11]  Thomas Boch,et al.  Efficient and Scalable Cross-Matching of (Very) Large Catalogs , 2011 .

[12]  J. Munn,et al.  The USNO-B Catalog , 2002, astro-ph/0210694.

[13]  Alexander S. Szalay,et al.  Large-Scale Query and XMatch, Entering the Parallel Zone , 2007, ArXiv.

[14]  D. A. García-Hernández,et al.  THE TENTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY: FIRST SPECTROSCOPIC DATA FROM THE SDSS-III APACHE POINT OBSERVATORY GALACTIC EVOLUTION EXPERIMENT , 2013, 1307.7735.

[15]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[16]  Chao Wu,et al.  Accelerating In-memory Cross Match of Astronomical Catalogs , 2013, 2013 IEEE 9th International Conference on e-Science.

[17]  Dongwei Fan,et al.  Matching Radio Catalogs with Realistic Geometry: Application to SWIRE and ATLAS , 2015, 1505.00621.