Semantic Data Integration for Investigations: Lessons Learned and Open Challenges

In today’s world we are confronted with increasing amounts of information every day coming from a large variety of sources. People and corporations are producing data on a large scale, and since the rise of the internet, e-mail and social media the amount of produced data has grown exponentially. From a law enforcement perspective we have to deal with these huge amounts of data when a criminal investigation is launched against an individual or company. Relevant questions need to be answered like who committed the crime, who were involved, what happened and on what time, who were communicating and about what? Not only the amount of available data to investigate has increased enormously, but also the complexity of this data has increased. These communication patterns need to be combined with the objective to extract entities, relations and events. Furthermore, the information management processes within crime investigations are very complex and often delegated to the investigator’s computer skills. Despite that, the application of natural language processing techniques based on crime data can prove to be beneficial in several processes of the criminal justice industry. Up to date, it is not feasible for the law enforcement agencies to get into the detail of these available massive crime reports and get the answers and furthermore is not available a system that permits to integrate information coming from text document with structured data. Starting from these considerations in this work, we propose a system to support the prosecutors to identify suspicious activities managing information of different kinds of format and sources. To tackle these problems associated with criminal justice industry, the proposed work describes a complete data flow that permits to manage the entity lifecycle, and to extract the relations from these entities.