Clustering huge protein sequence sets in linear time