Structures, Semantics and Statistics

At a fundamental level, the key challenge in data integration is to reconcile the semantics of disparate data sets, each expressed with a different database structure. I argue that computing statistics over a large number of structures offers a powerful methodology for producing semantic mappings, the expressions that specify such reconciliation. In essence, the statistics offer hints about the semantics of the symbols in the structures, thereby enabling the detection of semantically similar concepts. The same methodology can be applied to several other data management tasks that involve search in a space of complex structures and in enabling the next-generation on-the-fly data integration systems.