In this short paper, we propose the split-diffuse (SD) algorithm that takes the output of an existing word embedding algorithm, and distributes the data points uniformly across the visualization space. The result improves the perceivability and the interactability by the human.
We apply the SD algorithm to analyze the user behavior through access logs within the cyber security domain. The result, named the topic grids, is a set of grids on various topics generated from the logs. On the same set of grids, different behavioral metrics can be shown on different targets over different periods of time, to provide visualization and interaction to the human experts.
Analysis, investigation, and other types of interaction can be performed on the topic grids more efficiently than on the output of existing dimension reduction methods. In addition to the cyber security domain, the topic grids can be further applied to other domains like e-commerce, credit card transaction, customer service to analyze the behavior in a large scale.
[1]
Michael C. Hout,et al.
Multidimensional Scaling
,
2003,
Encyclopedic Dictionary of Archaeology.
[2]
Karl Pearson F.R.S..
LIII. On lines and planes of closest fit to systems of points in space
,
1901
.
[3]
J. Tenenbaum,et al.
A global geometric framework for nonlinear dimensionality reduction.
,
2000,
Science.
[4]
Pierre Geurts,et al.
Extremely randomized trees
,
2006,
Machine Learning.
[5]
W. Torgerson.
Multidimensional scaling: I. Theory and method
,
1952
.
[6]
Mikhail Belkin,et al.
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation
,
2003,
Neural Computation.
[7]
Geoffrey E. Hinton,et al.
Visualizing Data using t-SNE
,
2008
.