Experimental Evaluations of MapReduce in Biomedical Text Mining

In this paper, we demonstrate our development of two biomedical text mining applications: biomedical literature search (BLS) and biomedical association mining (BAM). While the former requires less computations, the latter is more computationally intensive. Experimental studies were conducted using Amazon Elastic MapReduce (EMR) with an input of 33,960 biomedical articles from TREC (Text REtrieval Conference) 2006 Genomics Track. Our experiment results indicated that both applications’ scalabilities were not linear in term of the number of computing nodes. Meanwhile, BAM achieved better scalability than BLS since BLS performed less computations and were primarily dominated by overheads such as JVM startup, scheduling, disk I/O, etc. These observations imply that existing MapReduce framework may not be suitable for on-line systems such as literature search that needs quick response.