INSITE: A Tool for Real-Time Knowledge Discovery from Users Web Navigation *

The major challenges in web mining are a) tracking the data accurately (as not everything is reported to the web server), b) real-time acquisition of the huge volume of data (435 Million visits to yahoo per day, 2-4 GB clickstream data per hour), c) real-time interpretation of the data without compromising the privacy of the user (order of seconds for personalization and targeting information), and d) visualization of the data to facilitate policy making. To address these challenges, we demonstrate an integrated software platform, called INSITE – a) to accurately track users interactions with a web space with minimum overhead and no voluntary user participation, b) to generate individual and aggregate user profiles in real time (or off-line) through the use of a unique Connectivity Matrix Model (CM-model), c) to show the efficacy and scalability of the CM-model in capturing the essence of the users' participatory attributes in the context of the web, d) to visualize the result of clustering of users navigation paths in real time by leveraging on the CM-model, and e) to execute a suite of queries (including temporal ones) and prove the utility of the captured data in making meaningful decisions about user interaction with a web site.