SEEM: a scalable visualization for comparing multiple large sets of attributes for malware analysis

Recently, the number of observed malware samples has rapidly increased, expanding the workload for malware analysts. Most of these samples are not truly unique, but are related through shared attributes. Identifying these attributes can enable analysts to reuse analysis and reduce their workload. Visualizing malware attributes as sets could enable analysts to better understand the similarities and differences between malware. However, existing set visualizations have difficulty displaying hundreds of sets with thousands of elements, and are not designed to compare different types of elements between sets, such as the imported DLLs and callback domains across malware samples. Such analysis might help analysts, for example, to understand if a group of malware samples are behaviorally different or merely changing where they send data. To support comparisons between malware samples' attributes we developed the Similarity Evidence Explorer for Malware (SEEM), a scalable visualization tool for simultaneously comparing a large corpus of malware across multiple sets of attributes (such as the sets of printable strings and function calls). SEEM's novel design breaks down malware attributes into sets of meaningful categories to compare across malware samples, and further incorporates set comparison overviews and dynamic filtering to allow SEEM to scale to hundreds of malware samples while still allowing analysts to compare thousands of attributes between samples. We demonstrate how to use SEEM by analyzing a malware sample from the Mandiant APT1 New York Times intrusion dataset. Furthermore, we describe a user study with five cyber security researchers who used SEEM to rapidly and successfully gain insight into malware after only 15 minutes of training.

[1]  Joshua Saxe,et al.  CrowdSource: Automated inference of high level malware functionality from low-level symbols using a crowd trained machine learning model , 2014, 2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE).

[2]  Kresimir Matkovic,et al.  Interactive Visual Analysis of Set-Typed Data , 2008, IEEE Transactions on Visualization and Computer Graphics.

[3]  Anne Verroust-Blondet,et al.  Ensuring the Drawability of Extended Euler Diagrams for up to 8 Sets , 2004, Diagrams.

[4]  Gregory J. Conti,et al.  Visual Reverse Engineering of Binary and Data Files , 2008, VizSEC.

[5]  Jakob Nielsen,et al.  A mathematical model of the finding of usability problems , 1993, INTERCHI.

[6]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[7]  Bettina Speckmann,et al.  KelpFusion: A Hybrid Set Visualization Technique , 2013, IEEE Transactions on Visualization and Computer Graphics.

[8]  R. Kosara,et al.  Parallel sets: visual analysis of categorical data , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[9]  Boris Müller,et al.  Elastic lists for facet browsers , 2007, 18th International Workshop on Database and Expert Systems Applications (DEXA 2007).

[10]  Leland Wilkinson,et al.  Exact and Approximate Area-Proportional Circular Venn and Euler Diagrams , 2012, IEEE Transactions on Visualization and Computer Graphics.

[11]  Joshua Saxe,et al.  Visualization of shared system call sequence relationships in large malware corpora , 2012, VizSec '12.

[12]  Mary Czerwinski,et al.  Design Study of LineSets, a Novel Set Visualization Technique , 2011, IEEE Transactions on Visualization and Computer Graphics.

[13]  Silvia Miksch,et al.  Radial Sets: Interactive Visual Analysis of Large Overlapping Sets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[14]  Gran Vía,et al.  GRAPHS, ENTROPY AND GRID COMPUTING: AUTOMATIC COMPARISON OF MALWARE , 2008 .

[15]  Bongshin Lee,et al.  Visualizing Concordance of Sets , 2006 .

[16]  Lorie M. Liebrock,et al.  Visualizing compiled executables for malware analysis , 2009, 2009 6th International Workshop on Visualization for Cyber Security.

[17]  Frank Ruskey,et al.  Drawing Area-Proportional Venn and Euler Diagrams , 2003, GD.

[18]  Hanspeter Pfister,et al.  UpSet: Visualization of Intersecting Sets , 2014, IEEE Transactions on Visualization and Computer Graphics.

[19]  Jacques Bertin,et al.  Graphics and graphic information-processing , 1981 .

[20]  Felix C. Freiling,et al.  Visual analysis of malware behavior using treemaps and thread graphs , 2009, 2009 6th International Workshop on Visualization for Cyber Security.

[21]  Bettina Speckmann,et al.  Kelp Diagrams: Point Set Membership Visualization , 2012, Comput. Graph. Forum.

[22]  Hongfang Liu,et al.  BMC Bioinformatics BioMed Central Methodology article VennMaster: Area-proportional Euler diagrams for functional GO , 2008 .

[23]  M. Sheelagh T. Carpendale,et al.  Bubble Sets: Revealing Set Relations with Isocontours over Existing Visualizations , 2009, IEEE Transactions on Visualization and Computer Graphics.

[24]  Bongshin Lee,et al.  Visualizing set concordance with permutation matrices and fan diagrams , 2007, Interact. Comput..

[25]  Eul Gyu Im,et al.  Malware analysis method using visualization of binary files , 2013, RACS.