Network-based filtering for large email collections in E-Discovery

The information overload in E-Discovery proceedings makes reviewing expensive and it increases the risk of failure to produce results on time and consistently. New interactive techniques have been introduced to increase reviewer productivity. In contrast, the techniques presented in this article propose an alternative method that tries to reduce information during culling so that less information needs to be reviewed. The proposed method first focuses on mapping the email collection universe using straightforward statistical methods based on keyword filtering combined with date time and custodian identities. Subsequently, a social network is constructed from the email collection that is analyzed by filtering on date time and keywords. By using the network context we expect to provide a better understanding of the keyword hits and the ability to discard certain parts of the collection.

[1]  Danah Boyd,et al.  Digital artifacts for remembering and storytelling: posthistory and social network fragments , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[2]  Andrew McCallum,et al.  Extracting social networks and contact information from email and the Web , 2004, CEAS.

[3]  A. Reeves,et al.  Term testing: a case study , 2008 .

[4]  Maarten de Rijke,et al.  Using Contextual Information to Improve Search in Email Archives , 2009, ECIR.

[5]  George L. Paul,et al.  Information Inflation: Can The Legal System Adapt? , 2007 .

[6]  Daniel Martin Katz,et al.  Law as a seamless web?: comparison of various network representations of the United States Supreme Court corpus (1791-2005) , 2009, ICAIL.

[7]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[8]  Fernanda B. Viégas,et al.  Visualizing email content: portraying relationships from conversational histories , 2006, CHI.

[9]  Romain Boulet,et al.  The network of French legal codes , 2009, ICAIL.

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  C. Görg,et al.  Jigsaw: investigative analysis on text document collectionsthrough visualization , 2008 .

[12]  D. T. Chaplin,et al.  Conceptual search – ESI, litigation and the issue of language , 2008 .

[13]  Henry Tirri,et al.  Multi-faceted information retrieval system for large scale email archives , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[14]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[15]  S. J. Attfield DESI II: Second International Workshop on Supporting Searchand Sensemaking for Electronically Stored Informationin Discovery Proceedings: Hosted by UCL Interaction Centre,University College London, 25th June 2008 , 2008 .

[16]  Michael Jünger,et al.  Graph Drawing Software , 2003, Graph Drawing Software.

[17]  Vladimir Batagelj,et al.  Pajek - Analysis and Visualization of Large Networks , 2004, Graph Drawing Software.

[18]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.