Trio: a system for data, uncertainty, and lineage

In the Trio project at Stanford, we are building a new kindof database managementsystem: one in whichdata, uncer-tainty of the data, and datalineage are all first-class citizensin an extended relational model and SQL-based query lan-guage. In an initial vision paper for the Trio project [5],we motivated the need for these three aspects to coexistin one system, and detailed numerous potential applica-tions including scientific data management, data cleaningandintegration,informationextractionsystems, andothers.(Specific example application scenarios will be discussedin Sections 2 and 4.)Since the inception of the project, we have:1. Studied the space of representation schemes for un-certain data, and properties of various schemes [3, 4].2. Proposed a new scheme called ULDBs. ULDBs ex-tend the relational model with simple forms of uncer-tainty that, when combined with lineage, yield niceproperties and strong expressiveness [1].3. Proposed a SQL-based query language for ULDBscalled TriQL (pronounced “treacle”). TriQL modifiesthe semantics of SQL to take uncertainty and lineageinto account, and introduces new constructs to queryuncertainty and lineage directly [2].4. Implemented a first working prototype of our modeland language by building on top of a conventionalDBMS [2].