Natural language reporting for ETL processes

The conceptual design of the Extract -- Transform -- Load (ETL) processes is a crucial, burdensome, and challenging procedure that takes places at the early phases of a Data Warehouse project. Several models have been proposed for the conceptual design and representation of ETL processes, but all share two inconveniences: they require intensive human effort from the designers to create them, as well as technical knowledge from the business people to understand them. In a previous work, we have relaxed the former difficulty by working on the automation of the conceptual design leveraging Semantic Web technology. In this paper, we built upon our previous results and we tackle the second issue by investigating the application of natural language generation techniques to the ETL environment. In particular, we provide a method for the representation of a conceptual ETL design as a narrative, which is the most natural means of communication and does not require knowledge of any specific model. We discuss how linguistic techniques can be used for the establishment of a common application vocabulary. Finally, we present a flexible and customizable template-based mechanism for generating natural language representations for the ETL process requirements and operations.

[1]  John Levine,et al.  Automatic generation of technical documentation , 1994, Appl. Artif. Intell..

[2]  Veda C. Storey,et al.  Naive Semantics to Support Automated Database Design , 2002, IEEE Trans. Knowl. Data Eng..

[3]  Eduard H. Hovy,et al.  Aggregation in Natural Language Generation , 1993, EWNLG.

[4]  Kalina Bontcheva Generating Tailored Textual Summaries from Ontologies , 2005, ESWC.

[5]  A Min Tjoa,et al.  Transformation of Requirement Specifications Expressed in Natural Language into an EER Model , 1993, ER.

[6]  Georgia Koutrika,et al.  Synthesizing structured text from logical database subsets , 2008, EDBT '08.

[7]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit , 2009 .

[8]  Leonid Kof,et al.  Natural Language Processing: Mature Enough for Requirements Documents Analysis? , 2005, NLDB.

[9]  Mario Piattini,et al.  Applying MDA to the development of data warehouses , 2005, DOLAP '05.

[10]  Tony Bain,et al.  Data Transformation Services , 2004 .

[11]  Elisabeth Métais,et al.  Database Schema Design: A Perspective From Natural Language Techniques to Validation and View Integration , 1993, ER.

[12]  Kristiina Jokinen,et al.  Generating Responses and Explanations from RDF/XML and DAML+OIL , 2003 .

[13]  John Mylopoulos,et al.  Experimenting with Linguistic Tools for Conceptual Modelling: Quality of the Models and Critical Features , 2004, NLDB.

[14]  Kalina Bontcheva,et al.  Automatic Report Generation from Ontologies: The MIAKT Approach , 2004, NLDB.

[15]  Graham Wilcock Talking OWLs: Towards an Ontology Verbalizer , 2003 .

[16]  Juan Trujillo,et al.  A UML Based Approach for Modeling ETL Processes in Data Warehouses , 2003, ER.

[17]  Alkis Simitsis,et al.  Mapping conceptual to logical models for ETL processes , 2005, DOLAP '05.

[18]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom , 1998 .

[19]  Panos Vassiliadis,et al.  Conceptual modeling for ETL processes , 2002, DOLAP '02.

[20]  Panos Vassiliadis,et al.  Data Mapping Diagrams for Data Warehouse Design with UML , 2004, ER.

[21]  Dimitrios Skoutas,et al.  Designing ETL processes using semantic web technologies , 2006, DOLAP '06.

[22]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[23]  Dimitrios Skoutas,et al.  Flexible and Customizable NL Representation of Requirements for ETL processes , 2007, NLDB.

[24]  Berthold Reinwald,et al.  Discovering topical structures of databases , 2008, SIGMOD Conference.

[25]  Alberto Abelló,et al.  Automating multidimensional design from ontologies , 2007, DOLAP '07.

[26]  Zoubida Kedad,et al.  Ontology-Based Data Cleaning , 2002, NLDB.

[27]  M. Reape,et al.  Just what is aggregation anyway ? , 2007 .

[28]  Emiel Krahmer,et al.  Squibs and Discussions: Real versus Template-Based Natural Language Generation: A False Opposition? , 2005, CL.