Semantic analysis of web site audience

With the emergence of the World Wide Web, analyzing and improving Web communication has become essential to adapt the Web content to the visitors' expectations. Web communication analysis is traditionally performed by Web analytics software, which produce long lists of page-based audience metrics. These results suffer from page synonymy, page polysemy, page temporality, and page volatility. In addition, the metrics contain little semantics and are too detailed to be exploited by organization managers and chief editors, who need summarized and conceptual information to take high-level decisions. To obtain such metrics, we mine the content of the Web pages output by the Web server. For a given taxonomy covering the Web site knwoledge domain, we compute the term weights in the output pages and we aggregate them using OLAP tools, in order to obtain concept-based metrics representing the audience of the Web site topics. To demonstrate how our approach solves the cited problems, we actually compute concept-based metrics with SQL Server OLAP Analysis Service and our prototype WASA for a number of case studies. Finally, we validate our results against a popular Web analytics tool.

[1]  Jean-Pierre Norguet,et al.  WebSphere Version 4 Application Development Handbook , 2002 .

[2]  Esteban Zimányi,et al.  Topic-Based Audience Metrics for Internet Marketing by Combining Ontologies and Output Page Mining , 2005, International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06).

[3]  Terumasa Aoki,et al.  Using SOFM to Improve Web Site Text Content , 2005, ICNC.

[4]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[5]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[6]  Asunción Gómez-Pérez,et al.  ONTOMETRIC: A Method to Choose the Appropriate Ontology , 2004, J. Database Manag..

[7]  Bruno Pouliquen,et al.  Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications , 2006, ArXiv.

[8]  Pier Luca Lanzi,et al.  Mining interesting knowledge from weblogs: a survey , 2005, Data Knowl. Eng..

[9]  Gerd Stumme,et al.  FCA-MERGE: Bottom-Up Merging of Ontologies , 2001, IJCAI.

[10]  Esteban Zimányi,et al.  Improving Web Sites with Web Usage Mining, Web Content Mining, and Semantic Analysis , 2006, SOFSEM.

[11]  Ed H. Chi,et al.  Using information scent to model user information needs and actions and the Web , 2001, CHI.

[12]  Organizations , 1992, Restoration & Management Notes.

[13]  Esteban Zimányi,et al.  OLAP Hierarchies: A Conceptual Perspective , 2004, CAiSE.

[14]  Bruno Pouliquen,et al.  Navigating multilingual news collections using automatically extracted information , 2005 .