Building Data Integration Systems via Mass Collaboration

Building data integration systems today is largely done by hand, in a very labor-intensive and error-prone process. In this paper we describe a conceptually new solution to this problem: that of mass collaboration. The basic idea is to think about a data integration system as having a finite set of parameters whose values must be set. To build such a system the system administrators construct and deploy a system “shell”, then ask the users to help the system “automatically converge” to the correct parameter values. This way the enormous burden of system development is lifted from the administrators and spread “thinly” over a multitude of users. We describe our current effort in applying this approach to the problem of schema matching in the context of data integration. We present experiments with both real and synthetic users that show the promise of the approach. Finally we discuss the future work, challenges, and the potential applications of the approach beyond the data integration context.