High-Performance Biomedical Association Mining with MapReduce

MapReduce has been applied to data-intensive applications in different domains because of its simplicity, scalability and fault-tolerance. However, its uses in biomedical association mining are still very limited. In this paper, we investigate using MapReduce to efficiently mine the associations between biomedical terms extracted from a set of biomedical articles. First, biomedical terms were obtained by matching text to Unified Medical Language System (UMLS) Metathesaurus, a biomedical vocabulary and standard database. Then we developed a MapReduce algorithm that could be used to calculate a category of interestingness measures defined on the basis of a 2×2 contingency table. This algorithm consists of two MapReduce jobs and takes a stripes approach to reduce the number of intermediate results. Experiments were conducted using Amazon Elastic MapReduce (EMR) with an input of 3610 articles retrieved from two biomedical journals. Test results indicate that our algorithm has linear scalability.

[1]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[2]  D. Lipman,et al.  National Center for Biotechnology Information , 2019, Springer Reference Medizin.

[3]  G. Niklas Norén,et al.  Temporal pattern discovery in longitudinal electronic patient records , 2010, Data Mining and Knowledge Discovery.

[4]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[7]  Ellen M. Voorhees,et al.  TREC genomics special issue overview , 2009, Information Retrieval.

[8]  Hong Yu,et al.  Beyond Information Retrieval - Medical Question Answering , 2006, AMIA.

[9]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Siddhartha R. Dalal,et al.  Using information mining of the medical literature to improve drug safety , 2011, J. Am. Medical Informatics Assoc..

[11]  Ujjwal Maulik,et al.  RANWAR: Rank-Based Weighted Association Rule Mining From Gene Expression and Methylation Data , 2015, IEEE Transactions on NanoBioscience.

[12]  Bo Peng,et al.  High-Performance Signal Detection for Adverse Drug Events using MapReduce Paradigm. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[13]  A. Horvath,et al.  From evidence to best practice in laboratory medicine. , 2013, The Clinical biochemist. Reviews.

[14]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[15]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[16]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[17]  Yanqing Ji,et al.  A Method for Mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs , 2013, IEEE Transactions on Knowledge and Data Engineering.

[18]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.