Optimal Semijoins for Distributed Database Systems

A Bloom-filter-based semijoin algorithm for distributed database systems is presented. This algorithm reduces communications costs to process a distributed natural join as much as possible with a filter approach. An optimal filter is developed in pieces. Filter information is used both to recognize when the semijoin will cease to be effective and to optimally process the semijoin. An ineffective semijoin will be quickly and cheaply recognized. An effective semijoin will use all of the transmitted bits optimally. >