METHOD AND SYSTEM FOR DETECTING A PROCESS OR ACTIVITY USING RECURRENT AND CONVOLUTIONAL 1D NEURAL NETWORKS

Presented herein are techniques that use multiple neural networks and segmentation of the traffic to detect the presence of applications or business processes within a noisy mixture of network traffic. In addition, the techniques presented herein provide a novel way to detect unusual, bad intentioned, and/or malicious activity, which is also a “process”, using recurrent and convolutional neural networks. The learning outcome can potentially identify compromised network infrastructure devices and/or telemetry collectors. DETAILED DESCRIPTION Data is being generated at high volumes, at high velocities, and with veracity by many different devices, applications, things, processes, etc. Conventional approaches to analysis of data typically involves collection of the data from the sources using different collectors and then storing the data into centralized repositories. In some cases, data may be tagged or labeled indicating the source (e.g., device or process of origin). However, more often data is not labeled (for various reasons); and subsequently, the relationship between the data and its source (process or thing) is not obvious. Similarly, sequences from multiple processes are lost in the mix of large repositories full of untagged data. In addition, cyber attackers are using techniques for attacking modern applications and "end-to-end processes." As described further below, the techniques presented herein propose automated and smart methods for analyzing data within telemetry collectors to glean knowledge about the underlying "processes" and application interactions. By learning about the underlying process, it is possible to create a model the describes the "end-to-end process" behavior. 2 Kvasyuk et al.: METHOD AND SYSTEM FOR DETECTING A PROCESS OR ACTIVITY USING RECUR Published by Technical Disclosure Commons, 2020 2 5950X This information, in turn, allows the system to recognize bad intentioned and/or malicious activity from future data streams. Applications, Devices, Things or Processes (collectively and generally referred to herein as “processes”) generate sequential data streams (e.g., network flows) in high volume. Data Collectors aggregate such sequences of data from multiple source processes into a single repository. In numerous cases, data streams are stored in a repository in a "blended" form or with a random mixture of multiple data sequences. In addition, in many cases, the elements of the data sequences are not tagged or labeled by the name of the source process generating the telemetry data stream. For example, processes can use the same symbols when generating telemetry data. The proposed techniques automatically detect the presence of the known “processes” in a random mixture of "unlabeled" data sequences. For example, as shown below in Figure 1, three processes each generate specific sequence of symbols. Figure 1 In addition, as shown below in Figure 2, the flow collector captures “network and/or data flows” and saves the flows into a repository as an interlaced mixture in much less clear form. Figure 2 As shown, each symbol does not carry any information about the source process (i.e., the process from which the symbol was originated). Therefore, as shown below in Figure 3, the collector stores the data from Processes A, B, and C as an unlabeled sequential mixture of symbols.