Space-filling Curves for High-performance Data Mining.

Space-filling curves like the Hilbert-curve, Peano-curve and Z-order map natural or real numbers from a two or higher dimensional space to a one dimensional space preserving locality. They have numerous applications like search structures, computer graphics, numerical simulation, cryptographics and can be used to make various algorithms cache-oblivious. In this paper, we describe some details of the Hilbert-curve. We define the Hilbert-curve in terms of a finite automaton of Mealy-type which determines from the two-dimensional coordinate space the Hilbert order value and vice versa in a logarithmic number of steps. And we define a context-free grammar to generate the whole curve in a time which is linear in the number of generated coordinate/order value pairs, i.e. a constant time per coordinate pair or order value. We also review two different strategies which enable the generation of curves without the usual restriction to square-like grids where the side-length is a power of two. Finally, we elaborate on a few applications, namely matrix multiplication, Cholesky decomposition, the Floyd-Warshall algorithm, k-Means clustering, and the similarity join.

[1]  Christian Böhm,et al.  Mining Massive Vector Data on Single Instruction Multiple Data Microarchitectures , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[2]  Claudia Plant,et al.  A Novel Hilbert Curve for Cache-Locality Preserving Loops , 2021, IEEE Transactions on Big Data.

[3]  Christian Böhm,et al.  Cache-oblivious loops based on a novel space-filling curve , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[4]  Christian Böhm,et al.  Parallel EM-Clustering: Fast Convergence by Asynchronous Model Updates , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[5]  Kuo-Liang Chung,et al.  Efficient algorithms for coding Hilbert curve of arbitrary-sized image and application to window query , 2007, Inf. Sci..

[6]  D. Hilbert Ueber die stetige Abbildung einer Line auf ein Flächenstück , 1891 .

[7]  Christian Böhm,et al.  Data Mining Using Graphics Processing Units , 2009, Trans. Large Scale Data Knowl. Centered Syst..

[8]  Christian Böhm,et al.  Cache-oblivious High-performance Similarity Join , 2019, SIGMOD Conference.

[9]  Christian Böhm,et al.  Multi-core K-means , 2017, SDM.

[10]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[11]  Christian Böhm,et al.  Identification of SNP interactions using data-parallel primitives on GPUs , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[12]  George H. Mealy,et al.  A method for synthesizing sequential circuits , 1955 .

[13]  Christian Böhm,et al.  Indexsupported Similarity Join on Graphics Processors , 2009, BTW.

[14]  Tao Yang,et al.  Cache-conscious performance optimization for similarity search , 2013, SIGIR.

[15]  Srikanta Tirthapura,et al.  Onion Curve: A Space Filling Curve with Near-Optimal Clustering , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[16]  Christian Böhm,et al.  Massively parallel expectation maximization using graphics processing units , 2013, KDD.

[17]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[18]  G. Peano Sur une courbe, qui remplit toute une aire plane , 1890 .