Clustering Algorithms andLatent Semantic Indexing toIdentify Similar PagesinWebApplications

exemplify thenavigational structure ofawebapplication according tothetypology andthetopology ofits Inthis paper, weanalyze someclustering algorithmsinterconnected components, thusmakingaccessible that havebeenwidely employed inthepasttosupport the detailed information exploding eachcluster intosub comprehension ofwebapplications. Tothis end,wehave clusters. Ontheother hand, groups ofsimilar pagesatthe defined anapproach toidentify static pagesthatare content level could bealso usedtosupport thesoftware duplicated orcloned atthecontent level. Thisapproach is engineer inthereengineering ofthenavigation schema of basedonaprocess that first computes thedissimilarity awebapplication insuchawaythat pagescontaining between webpagesusing Latent Semantic Indexing, awell similar content canbeeasily navigated andsearched knowninformation retrieval technique, andthengroups according totheresults oftheclustering process. similar pagesusingclustering algorithms. We considerFurthermore, groups ofstatic pagesthat aresimilar atthe five instances ofthis process, eachbased onthree variantscontent level couldbegeneralized byextracting the oftheagglomerative hierarchical clustering algorithm, a similar content andmaintaining itinaseparate file orina divisive clustering algorithm, k-meanspartitional database. clustering algorithm, andawidely employed partitionalGenerally, several non trivial issues havetobe competitive clustering algorithm, namelyWinnerTakes considered toavoidthata clustering basedapproach All. Inorder toassess theproposed approach, wehave produces unsuitable results. First ofall, a software usedthestatic pagesofthree webapplications andone engineer should choose thefeatures ofthepagesofaweb static website. application tobeconsidered intheclustering process. Thesefeatures represent thebasicproperties tocompare

[1]  Paolo Tonella,et al.  Restructuring multilingual web sites , 2002, International Conference on Software Maintenance, 2002. Proceedings..