ESub: Exploration of Subgraphs. A tool for exploring models generated by Graph Mining algorithms

In this demo we introduce ESub, a tool aimed at visualizing the outcome provided by a frequent subgraph mining algorithm, i.e. SUBDUE. Such a tool has been developed as a supporting tool for a methodology we proposed in previous works for analyzing unstructured processes, based on the use of graphs. By exploiting graphs-based techniques, it is possible to provide the user with a different perspective on a process, where only the most relevant subprocesses (i.e., subgraphs) are displayed, rather than the complete, end-to-end process schema, which often results very chaotic in unstructured domains. Our tool allows the user to visualize and interact with such subgraphs. Furthermore, it allows for visualizing the original graphs of the set, and compress them by means of the most relevant subgraphs, in order to obtain a simplified view of the overall process. 1 SubProcesses Analysis Process Mining (PM) methods are aimed at extracting from an event log a process schema describing the flow of the performed activities [1]. However, when applied to the so-called “Spaghetti Processes", i.e. processes with little or no structure, classical PM techniques usually generate a very chaotic model, usually not understandable for a human analyst. As a remedy, in previous works [2] we proposed an alternative methodology for analyzing spaghetti processes, aimed at extracting their most relevant subprocesses. Such approach exploits a graphbased technique; more precisely, it requires to transform the event log of the process in a set of graphs, each of them describing the execution of a certain process instance, then extracting the subprocesses from these graphs. In this paper, we present ESub, a tool we developed to manage the set of subprocesses extracted from a spaghetti process, represented as subgraphs. ESub Copyright c ©2015 for this paper by its authors. Copying permitted for private and academic purposes is designed to offer advanced functionalities for the subprocesses visualization and analysys. Furthermore, it also provides the user with a flexible mechanism to simplify the overall process visualization by exploiting a compression mechanism, i.e. by replacing each subgraph with single nodes. The user can set the desired level of compression. The current implementation of our tool is aimed to manage the set of subgraphs extracted by the SUBDUE algorithm [4], that is a hierarchical clustering algorithm. Anyway, it is easy to build adapters to apply our tool also to results obtained from other FSM algorithm. The outcome provided by SUBDUE is a hierarchy, where the top level subgraphs are built by using only elements of the original graphs set (i.e., nodes and edges), while lower level subgraphs involve the higher level subgraphs in their definition. Typically, the top-level subgraphs represent the most relevant ones. SUBDUE also labels each subgraph on the basis of the order in which they have been extracted. We discussed in [3] a set of measures to evaluate the SUBDUE subgraphs; currently, two of them are taken into account by ESub, i.e. the frequency (FREQ), which evaluates the number of occurrences of a subgraph in a graphs set, and the representativeness (REP), which evaluates the percentage of graphs in which a subgraph occurred at least once. Our tool is implemented as a web application. The most of the available functionalities can be called from a left side menu, organized in a set of tabs, as shown in Figure 1; the user can expand each tab by clicking on its label. The following subsections describe the two main groups of functionalities offered by the tool, i.e. the visualization and the compression of subgraphs. 1.1 SubProcesses Visualization There are two main kinds of visualization functionalities, i.e. the a)navigation, and the b)filtering of subgraphs. As regards group a), first the user has to use the load tab to upload the file of the subgraphs to visualize. From the load tab, the user can chose if upload the outcome returned by SUBDUE, that is a SUBS file, or a more generic DOT file, depending on which outcome the user has at disposal. It is also possible, although not mandatory, to load a LOG file, that stores the FREQ and REP values for the subgraphs, which can be exploited during the subgraphs exploration. After the upload, each subgraph is displayed as a single node, with the label assigned to it by SUBDUE. Such compact representation provides an overview of the complete hierarchy extracted by SUBDUE. The user can, anyway, expand a node by selecting it and then using the expand tab, or by simply doubleclicking the node. Similarly, it is possible to compress an expanded node by selecting it and using the compress tab or by double-clicking the node. The expand/compress tabs allow the user also to expand/compress several nodes (possibly, the entire hierarchy) at the same time. Figure 1 shows the uploaded subgraphs set, with the nodes SUB2 and SUB31 expanded. Note that SUB2 is a “parent" of SUB31, since SUB31 involves SUB2, as we can see in the figure. Using the tab Layout, it is possible to adjust the visualization according to the user’s needs; in particular, it is possible to increase/decrease both the width and the height of the displayed outcome. It is also possible to enable/disable the movement of the nodes. Note that there is also a right-side menu, which displays the list of the subgraphs that are currently compressed/expanded; by clicking the name of a compressed/expanded subgraph, the corresponding node in the hierarchy is surrounded with a red square. Fig. 1: Uploaded subgraphs set, with two subgraphs expanded Functionalities of group b) aim at supporting the users in easily detecting the most interesting subgraphs. It is unrealistic that the user can analyze each single subgraph, since they can be hundreds. Hence, the tool implements some functionalities aimed to support the search, or the filtering, of specific subgraphs. The first, and the most simple one, is the search tab, which allows the user to search the subgraphs by using user’s keywords. The other search functionality is provided by the filter tab, which allows for detecting the subgraphs which correspond to certain values of FREQ and/or REP. It is also possible to use the filter functionality without changing the default parameters setting; in this case, it will list all the top level subgraphs, reporting their values of FREQ and REP. It is interesting to note that the filtering functionality takes into account only the top-level subgraphs; in fact, since they are assumed to be the most relevant ones, the FREQ and REP values are computed only for them. We would like to point out that the tool also allows the user to export either the entire hierarchy or a set of selected subgraphs. More precisely, by using the export functionality it is possible to export the visualized hierarchy either in a SVG or in a DOT format; the corresponding file is automatically downloaded. Furthermore, it is also possible to select a set of subgraphs and visualize them in another web page; this is extremely useful when only a subset of the hierarchy has to be analyzed. Note that in the new page, by selecting again the export tab, the tool generates the SVG or DOT file corresponding only to the exported portion of the hierarchy. Fig. 2: Filtering Tab