论文信息 - Performance of Distributed Text Processing System Using Hadoop

Performance of Distributed Text Processing System Using Hadoop

Big Data brings new challenges to the field of e-Discovery or digital forensics and these challenges are mostly connected to the various methods of data processing. Considering that the most important factors are time and cost in determining success or failure of digital investigation, development of search method comes first to more quickly and accurately find relevant evidence in Big Data. This paper, therefore, introduces a Distributed Text Processing System based on Hadoop called DTPS and explains about the distinctions between DTPS and other similar researches to emphasize the necessity of it. In addition, this paper describes experimental results to find the best architecture and implementation strategy for using Hadoop MapReduce as a major part of the future e-Discovery cloud service.

Sang-Uk Shin | Taerim Lee | Kyung Hyune Rhee | Hun Kim

[1] Linda Volonino,et al. e-Discovery for Dummies , 2009 .

[2] Mark H. Butler,et al. Distributed Lucene : A distributed free text index for Hadoop , 2008 .

[3] Jooyoung Lee,et al. Digital forensics as a service: A case study of forensic indexed search , 2012, 2012 International Conference on ICT Convergence (ICTC).

[4] Sang-Uk Shin,et al. Design and implementation of E-discovery as a service based on cloud computing , 2013, Comput. Sci. Inf. Syst..

[5] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[6] A. M. Dobie. The Federal Rules of Civil Procedure , 1939 .

[7] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..