Parallel Algorithms for Multirelational Data Mining: Application to Life Science Problems

Data Mining (DM) algorithms are able to construct models from available data that can be very useful for both business and science. However, a powerful representation language is required to express the highly complex models that stem from structured data. Multirelational algorithms can then take advantage of this representation for both data and models. The drawback is that for very large or highly complex domains multirelational algorithms may require long running times. This problem can be substantially reduced using parallel implementations. In this chapter, we present a survey on parallel approaches to run Inductive Logic Programming (ILP), a flavor of multirelational algorithms. We also analyze different scheduling approaches for those implementations and describe two applications where the proposed approaches may be very useful.

[1]  Jun Zhang,et al.  Cloud Computing Resource Scheduling and a Survey of Its Evolutionary Approaches , 2015, ACM Comput. Surv..

[2]  Richard A. Lewis,et al.  Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ewa Deelman,et al.  Scientific workflows and clouds , 2010, ACM Crossroads.

[4]  Carlos Alberto Martinez-Angeles,et al.  A Datalog Engine for GPUs , 2013, KDPD.

[5]  Marius Hillenbrand,et al.  High performance cloud computing , 2013, Future Gener. Comput. Syst..

[6]  Tohgoroh Matsui,et al.  Comparison of Three Parallel Implementations of an Induction Algorithm , 1998 .

[7]  Qingbo Wu,et al.  Workflow scheduling in cloud: a survey , 2015, The Journal of Supercomputing.

[8]  Stephen Muggleton Inductive Logic Programming: Derivations, Successes and Shortcomings , 1993, ECML.

[9]  David B. Skillicorn,et al.  Parallel and Sequential Algorithms for Data Mining Using Inductive Logic , 2001, Knowledge and Information Systems.

[10]  Ashwin Srinivasan,et al.  Parallel ILP for distributed-memory architectures , 2009, Machine Learning.

[11]  E. D. Giorgi Selected Papers , 2006 .

[12]  Rüdiger Wirth,et al.  Learning by Failure to Prove , 1988, EWSL.

[13]  Jan Blaťák,et al.  dRAP: A Framework for Distributed Mining Firts-Order FrequentPatterns , 2006 .

[14]  Ashwin Srinivasan,et al.  Query Transformations for Improving the Efficiency of ILP Systems , 2003, J. Mach. Learn. Res..

[15]  Farookh Khadeer Hussain,et al.  Task Based System Load Balancing Approach in Cloud Environments , 2014 .

[16]  Luc De Raedt,et al.  Parallel inductive logic programming , 1995 .

[17]  Lavanya Ramakrishnan,et al.  Magellan: experiences from a science cloud , 2011, ScienceCloud '11.

[18]  Enrico W. Coiera,et al.  Learning Qualitative Models of Dynamic Systems , 2004, Machine Learning.

[19]  Stephen Muggleton,et al.  Inductive Logic Programming , 2011, Lecture Notes in Computer Science.

[20]  Wray L. Buntine Generalized Subsumption and Its Applications to Induction and Redundancy , 1986, Artif. Intell..

[21]  M J Sternberg,et al.  Machine learning approach for the prediction of protein secondary structure. , 1990, Journal of molecular biology.

[22]  Ramasamy Uthurusamy,et al.  Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada, August 20-21, 1995 , 1995, KDD.

[23]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[24]  M J Sternberg,et al.  Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[26]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[27]  Cristian Mateos,et al.  Distributed job scheduling based on Swarm Intelligence: A survey , 2014, Comput. Electr. Eng..

[28]  Andreia Malucelli,et al.  Assessing the Eligibility of Kidney Transplant Donors , 2009, MLDM.

[29]  J. A. Robinson,et al.  A Machine-Oriented Logic Based on the Resolution Principle , 1965, JACM.

[30]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[31]  Stephen Muggleton,et al.  Relational Rule Induction with CProgol4.4: A Tutorial Introduction , 2001 .

[32]  L. D. Dhinesh Babu,et al.  Honey bee behavior inspired load balancing of tasks in cloud computing environments , 2013, Appl. Soft Comput..

[33]  Stephen Muggleton,et al.  Learning from Positive Data , 1996, Inductive Logic Programming Workshop.

[34]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[35]  Vicki Dellarco,et al.  Use of mechanism-based structure-activity relationships analysis in carcinogenic potential ranking for drinking water disinfection by-products. , 2002, Environmental health perspectives.

[36]  Dhabaleswar K. Panda,et al.  A case for high performance computing with virtual machines , 2006, ICS '06.

[37]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[38]  Fumio Mizoguchi,et al.  Parallel Execution for Speeding Up Inductive Logic Programming Systems , 1999, Discovery Science.

[39]  Fumio Mizoguchi,et al.  Concurrent Execution of Optimal Hypothesis Search for Inverse Entailment , 2000, ILP.

[40]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[41]  Seyed Masoud Sadjadi,et al.  Paravirtualization for Scientific Computing: Performance Analysis and Prediction , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[42]  Julio Saez-Rodriguez,et al.  Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties , 2012, PloS one.

[43]  Reid G. Smith,et al.  The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.

[44]  Stasinos Konstantopoulos,et al.  A Data-Parallel Version of Aleph , 2007, ArXiv.

[45]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[46]  Jan Wielemaker,et al.  Native Preemptive Threads in SWI-Prolog , 2003, ICLP.

[47]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[48]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[49]  Amanda Clare,et al.  Data Mining the Yeast Genome in a Lazy Functional Language , 2003, PADL.

[50]  Farookh Khadeer Hussain,et al.  Task-Based System Load Balancing in Cloud Computing Using Particle Swarm Optimization , 2013, International Journal of Parallel Programming.

[51]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[52]  John Wylie Lloyd,et al.  Foundations of Logic Programming , 1987, Symbolic Computation.

[53]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[54]  Ricardo Rocha,et al.  Threads and or-parallelism unified , 2010, Theory Pract. Log. Program..

[55]  Saso Dzeroski,et al.  Inductive Logic Programming 7th International Workshop, Ilp-97, Prague, Czech Republic, September 17-20, 1997 : Proceedings , 1997 .

[56]  Peter Schachte,et al.  Estimating the overlap between dependent computations for automatic parallelization , 2011, Theory Pract. Log. Program..

[57]  James H. Graham,et al.  Accelerating the drug design process through parallel inductive logic programming data mining , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[58]  Eugénio C. Oliveira,et al.  Discovery of functional relationships in multi-relational data using inductive logic programming , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[59]  Mats Carlsson,et al.  Parallel execution of prolog programs: a survey , 2001, TOPL.

[60]  Nuno A. Fonseca,et al.  Interactive Discriminative Mining of Chemical Fragments , 2010, ILP.

[61]  Ashwin Srinivasan,et al.  Carcinogenesis Predictions Using ILP , 1997, ILP.

[62]  Kousik Dasgupta,et al.  A Genetic Algorithm (GA) based Load Balancing Strategy for Cloud Computing , 2013 .

[63]  Ryszard S. Michalski,et al.  Pattern Recognition as Rule-Guided Inductive Inference , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[65]  Lei Wu,et al.  An Intelligent Load Balancing Algorithm Towards Efficient Cloud Computing , 2011, AI for Data Center Management and Cloud Computing.