Monitoring Leaked Confidential Data

During the first half of 2018 over than 945 data breaches resulted in 4.5 Billion data records been compromised worldwide. Data leak is one of the biggest security issues targeting the industrial and governmental sectors. The data loss hemorrhage is too important and uncontrollable that companies and institutions need to react very quickly to reduce the risk of being targeted by an attack exploiting leaked data. Unfortunately, this in not yet the case, because on average a company spend 196 days to identify a data breach and 69 additional days to contain it. In order to reduce the identifications time, we propose a solution to monitor, in real time, huge streams of leaked data published on hacking sources. These ese data are classified, and confidential information is precisely identified. This classification is per-formed by the combination of inference rules and a Convolutional Neural Network pre-trained model, which recognizes different patterns of confidential data. We also describe our observations from the data that we collected and identified in the context of a company monitoring use case.