论文信息 - Determining and Visualising E-mail Subsets to Support E-discovery

Determining and Visualising E-mail Subsets to Support E-discovery

Electronic discovery (E-discovery) is a legal process for investigating various events in the corporate world, for the purpose of producing/obtaining evidence, one such example is an email communication (eg. Enron case). Investigating emails collected over a period of time, manually, is a strenuous process and the tools currently available on the market are based on simple keyword search and legal firms charge companies based on the volume of information produced by the search, which is then manually reviewed intensely. This results in significant costs for the company or in a number of cases settlement because they can’t afford the costs of E-discovery. So, there is a great need to determine, visualise and understand whether email subsets are normal or abnormal, pertinent or privileged, relevant (interesting) or immaterial in a quick time. In order to determine relevant subsets for a legal case and to gain invaluable insight in a quick time from the email communications, we propose a multi-modal and multi-level approach which will generate automated visual representations using a manual keyword search facility that will extract the most relevant information from the email data and aids in comparing two subsets of information. In this paper, we discuss the literature review carried out, initial design process, prototypes developed and the workshops conducted. As a future work, we aim to develop a full-fledged E-discovery tool that could be implemented by the organisations to investigate email communications.

Mithileysh Sathiyanarayanan | Cagatay Turkay | Mithileysh Sathiyanarayanan | C. Turkay

[1] Hanspeter Pfister,et al. UpSet: Visualization of Intersecting Sets , 2014, IEEE Transactions on Visualization and Computer Graphics.

[2] Loren G. Terveen,et al. ContactMap: Organizing communication in a social desktop , 2004, TCHI.

[3] Yiming Yang,et al. The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .