Software Development for Rule-Based Spreadsheet Data Extraction and Transformation

The paper proposes a knowledge-based software platform to generate applications for spreadsheet data extraction and transformation. The platform includes a flexible table object model and a domain-specific language for expressing user-defined rules of table analysis and interpretation. They serve to represent knowledge of table layout and content features, as well as their interpretation, depended on transformation goals. The platform enables translating such user-defined rules to Java programs. The generated source code is serialized as a project prepared for building an executable application by using the Maven tool. The execution of the generated application transforms spreadsheet data from arbitrary form defined by the rules to the canonical one. The empirical results demonstrate the applicability of the software platform to develop applications for converting data from arbitrary spreadsheet tables originated from various domains to relational flat file databases.