Performance of Distributed Text Processing System Using Hadoop

Big Data brings new challenges to the field of e-Discovery or digital forensics and these challenges are mostly connected to the various methods of data processing. Considering that the most important factors are time and cost in determining success or failure of digital investigation, development of search method comes first to more quickly and accurately find relevant evidence in Big Data. This paper, therefore, introduces a Distributed Text Processing System based on Hadoop called DTPS and explains about the distinctions between DTPS and other similar researches to emphasize the necessity of it. In addition, this paper describes experimental results to find the best architecture and implementation strategy for using Hadoop MapReduce as a major part of the future e-Discovery cloud service.