A new linear approximate clustering algorithm based upon sampling with probability distribution