Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis