Distributed data stream clustering method and system

The invention discloses distributed data stream clustering method and system and overcomes the defect that the existing most data steam clustering algorithms are unable to run in the distributed cloud environment, unable to easily extend and low in operational time efficiency. The method includes: summarizing data streams to obtain a plurality of eigenvectors of the data streams; performing locality-sensitive hashing algorithm to obtain a plurality of clusters with each comprising at least one eigenvector, and selecting at least one cluster as a candidate cluster; periodically using the candidate cluster to cluster eigenvectors of newly arrived data streams. The real-time performance better than that of the prior art is guaranteed by the use of the method and system based on the locality-sensitive hashing algorithm.