Comparison of Clustering Methods over a Hidden Web Data using Stratification

This paper’s centre of attention is on the problem of data mining (in general) and clustering (in specific) on a hidden web data. We know that data mining is a process that analyzes and extracts knowledge from large amounts of data which provides useful information to users. Hidden or deep web data is the database located at remote system .So, to access such data, we need query interface or HTML forms. Clustering such type of data is difficult as it is limited to indirect access through query interface and requires more time to access. A novel methodology stratified clustering introduced through sampling of datasets. The samples can only be obtained by submitting queries. It is required to apply efficient sampling method to reduce time consumption and number of queries required to access deep web data. This paper proposes series of steps to accomplish the task.1) the space of input attributes are categorized into stratum that represents the association between input and output attributes.2) Efficient sampling method proposed to obtain high estimation accuracy .3) the samples obtained are used by two clustering methods, stratified k-means clustering and hierarchical clustering. The estimation accuracy of cluster centers of deep web data are compared for simple random sampling against stratified sampling and k-means clustering method against hierarchical clustering method.