Managing Unstructured Data With Structured Legacy Systems

In this paper we describe an approach and system for managing and joining enterprise semi-structured data in a high-throughput, nimble, and scalable systems with traditional relational database management systems (RDBMS). This paper presents the second release of NASA's NETMARK system. NETMARK is an Enterprise Information Integration (EII) framework based on a modern "schema-less" concept approach. NETMARK "schema- less" information integration reinvents the way of managing semi-structured documents within traditional RDBMS. We describe in particular detail the unique underlying data storage approach and efficient query processing mechanisms given the new proposed storage system upgrade. We present an extensive evaluation of the virtual union between NETMARK with the persistent schemas similar to commercial off-the-self products, such as Systems Applications and Products (SAP), currently utilized for NASA's Financial System, through well validated applications. At the heart of the approach is the philosophy of a well-defined and focused approach on most common data management requirements in the enterprise, and not burdening users and application developers with unnecessary complexity and formal data integration processes. This paper presents the details of achieving the integration between two incompatible systems.