Data mining research has long concentrated on the five main areas: clustering, association discovery, classification, forecasting and sequential patterns. Web data mining projects are concerned mainly with text mining, user segmentation, forecasting web usage and analyzing users' clickstream patterns. We present a new type of web usage mining called funnel analysis or funnel report mining. A funnel report is a study of the retention behavior among a series of pages or sites. For example, of all hits on the home page of www.msn.com, what percentages of those are followed by hits to moneycentral.msn.com? What percentage of www.msn.com hits are followed by moneycentral.msn.com, and then www.msnbc.com? What are the most interesting funnels starting with www.msn.com? Where does the greatest drop off rate occur after a user has hit MSNBC? Funnel reports are extremely useful in e-business because they give product planners an idea of how usable and well-structured their site is. From our experience performing web usage mining for the MSN network of sites, funnel reports are requested even more than user segmentation analyses, site affiliation studies and classification exercises. In this paper, we define a framework for funnel analysis and provide a tree-based solution we have been using successfully to extract all relevant funnels using only one scan of the data file.
[1]
Jaideep Srivastava,et al.
Web usage mining: discovery and applications of usage patterns from Web data
,
2000,
SKDD.
[2]
Philip S. Yu,et al.
SpeedTracer: A Web Usage Mining and Analysis Tool
,
1998,
IBM Syst. J..
[3]
José Luis Cabral de Moura Borges,et al.
A data mining model to capture user web navigation patterns
,
2000
.
[4]
Ramakrishnan Srikant,et al.
Mining Sequential Patterns: Generalizations and Performance Improvements
,
1996,
EDBT.
[5]
Bettina Berendt,et al.
Web Usage Mining, Site Semantics, and the Support of Navigation
,
2000
.
[6]
Lars Schmidt-Thieme,et al.
Mining Web Navigation Path Fragments
,
2002
.
[7]
Hiroki Kato,et al.
Navigation Analysis Tool based on the Correlation be- tween Contents Distribution and Access Patterns
,
2000
.