A Declarative Pipeline Language for Complex Data Analysis

We introduce BANpipe – a logic-based scripting language designed to model complex compositions of time consuming analyses. Its declarative semantics is described together with alternative operational semantics facilitating goal directed execution, parallel execution, change propagation and type checking. A portable implementation is provided, which supports expressing complex pipelines that may integrate different Prolog systems and provide automatic management of files.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Peter A. Lindsay,et al.  FME 2002:Formal Methods—Getting IT Right , 2002, Lecture Notes in Computer Science.

[3]  Kazunori Ueda,et al.  Guarded Horn Clauses , 1986, LP.

[4]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[5]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[6]  Henning Christiansen,et al.  Bayesian Annotation Networks for Complex Sequence Analysis , 2011, ICLP.

[7]  Ole Torp Lassen,et al.  Compositionality in probabilistic logic modelling for biological sequence analysis , 2011 .

[8]  Andrew C. Stewart,et al.  DIYA: a bacterial annotation pipeline for any genomics lab , 2009, Bioinform..

[9]  Stuart I. Feldman,et al.  Make — a program for maintaining computer programs , 1979, Softw. Pract. Exp..

[10]  M. Borodovsky,et al.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. , 2001, Nucleic acids research.

[11]  William Stafford Noble A Quick Guide to Organizing Computational Biology Projects , 2009, PLoS Comput. Biol..

[12]  André Yoshiaki Kashiwabara,et al.  EGene: a configurable pipeline generation system for automated sequence analysis , 2005, Bioinform..

[13]  Steven Knight,et al.  Building software with SCons , 2005, Comput. Sci. Eng..

[14]  Paulo Moura Programming Patterns for Logtalk Parametric Objects , 2009, INAP.

[15]  Niels Jørgensen,et al.  Safeness of Make-Based Incremental Recompilation , 2002, FME.

[16]  Ehud Shapiro,et al.  The family of concurrent logic programming languages , 1989, CSUR.

[17]  Taisuke Sato,et al.  PRISM: A Language for Symbolic-Statistical Modeling , 1997, IJCAI.

[18]  Shawn Hoon,et al.  Biopipe: a flexible framework for protocol-based bioinformatics analysis. , 2003, Genome research.

[19]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..