Clustering has been an active research area of great practical importance for recent years. Most previous clustering models have focused on grouping objects with similar values on a (sub)set of dimensions (e.g., subspace cluster) and assumed that every object has an associated value on every dimension (e.g., bicluster). These existing cluster models may not always be adequate in capturing coherence exhibited among objects. Strong coherence may still exist among a set of objects (on a subset of attributes) even if they take quite different values on each attribute and the attribute values are not fully specified. This is very common in many applications including bio-informatics analysis as well as collaborative filtering analysis, where the data may be incomplete and subject to biases. In bio-informatics, a bicluster model has recently been proposed to capture coherence among a subset of the attributes. We introduce a more general model, referred to as the /spl delta/-cluster model, to capture coherence exhibited by a subset of objects on a subset of attributes, while allowing absent attribute values. A move-based algorithm (FLOC) is devised to efficiently produce a near-optimal clustering results. The /spl delta/-cluster model takes the bicluster model as a special case, where the FLOC algorithm performs far superior to the bicluster algorithm. We demonstrate the correctness and efficiency of the /spl delta/-cluster model and the FLOC algorithm on a number of real and synthetic data sets.
[1]
Ali S. Hadi,et al.
Finding Groups in Data: An Introduction to Chster Analysis
,
1991
.
[2]
Pattie Maes,et al.
Social information filtering: algorithms for automating “word of mouth”
,
1995,
CHI '95.
[3]
Jiong Yang,et al.
STING: A Statistical Information Grid Approach to Spatial Data Mining
,
1997,
VLDB.
[4]
Dimitrios Gunopulos,et al.
Automatic subspace clustering of high dimensional data for data mining applications
,
1998,
SIGMOD '98.
[5]
S. Wittevrongel,et al.
Queueing Systems
,
2019,
Introduction to Stochastic Processes and Simulation.
[6]
H. V. Jagadish,et al.
Semantic Compression and Pattern Extraction with Fascicles
,
1999,
VLDB.
[7]
Jiong Yang,et al.
Collaborative Web caching based on proxy affinities
,
2000,
SIGMETRICS '00.
[8]
Naftali Tishby,et al.
Document clustering using word clusters via the information bottleneck method
,
2000,
SIGIR '00.
[9]
George M. Church,et al.
Biclustering of Expression Data
,
2000,
ISMB.
[10]
Petra Perner,et al.
Data Mining - Concepts and Techniques
,
2002,
Künstliche Intell..