Scaling Up Set Similarity Joins Using a Cost-Based Distributed-Parallel Framework