The ETLMR MapReduce-Based ETL Framework

This paper presents ETLMR, a parallel Extract-Transform-Load (ETL) programming framework based on MapReduce. It has builtin support for high-level ETL-specific constructs including star schemas, snowflake schemas, and slowly changing dimensions (SCDs). ETLMR gives both high programming productivity and high ETL scalability.