Semantic clustering of information systems' users with stochastic techniques

We use a Markovian model to capture the habitual user profiles of an information access system. In this model, the general as well as the individual for each user, profile is captured in the form of a Markovian process where the states are the keywords asked to the system by the users and a transition from state to state corresponds to the order theses keywords appeared in the queries. Under this model the probabilistic locality of the Markovian state space translates to semantical locality of the corresponding keywords in a way that a clustering of the Markovian state space corresponds to a semantic clustering of the keyword space. Since the states represent keywords asked by the users, the state space can grow very large, but at the same time it is partitioned into disjoint subsets such that strong interactions among the states of the same subset exists but weak interactions among states of different subsets. We exploit this structure to effectively cluster the large state space and reveal the corresponding semantic keyword clusters. We then define a semantic distance between the various user profiles that can be used to cluster the user space on the basis of keyword usage and keyword semantic relevance. The resulting clustering achieves high independence from the row data. Users for e.g. that never asked a common keyword may end up very close to each other if their keywords were asked together by many other users.

[1]  John D. Lowrance,et al.  A Framework for Evidential-Reasoning Systems , 1990, AAAI.

[2]  Khaled Mellouli,et al.  Propagating belief functions in qualitative Markov trees , 1987, Int. J. Approx. Reason..

[3]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[4]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[5]  Frank M. Shipman,et al.  Finding and using implicit structure in human-organized spatial layouts of information , 1995, CHI '95.

[6]  Ronald A. Howard,et al.  Dynamic Probabilistic Systems , 1971 .

[7]  Gediminas Adomavicius,et al.  Using Data Mining Methods to Build Customer Profiles , 2001, Computer.

[8]  Vipin Kumar,et al.  Mining Indirect Associations in Web Data , 2001, WEBKDD.

[9]  RICHARD 0. DUDA,et al.  Subjective bayesian methods for rule-based inference systems , 1899, AFIPS '76.

[10]  Ugo Montanari,et al.  Networks of constraints: Fundamental properties and applications to picture processing , 1974, Inf. Sci..

[11]  Arthur C. Graesser,et al.  AutoTutor: an intelligent tutoring system with mixed-initiative dialogue , 2005, IEEE Transactions on Education.

[12]  Roger C. Schank,et al.  Conceptual dependency: A theory of natural language understanding , 1972 .

[13]  Elmar Nöth,et al.  Dialog act classification with the help of prosody , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Chaomei Chen Generalised similarity analysis and pathfinder network scaling , 1998, Interact. Comput..

[15]  Jörg Helbig,et al.  Speech-Controlled Human Machine Interaction (Sprachgesteuerte Mensch-Maschine-Interaktion) , 2004, it Inf. Technol..

[16]  Yannis Manolopoulos,et al.  . EFFECTIVE PREDICTION OF WEB-USER ACCESSES: A DATA MINING APPROACH , 2001 .

[17]  Gediminas Adomavicius,et al.  Expert-Driven Validation of Rule-Based User Models in Personalization Applications , 2004, Data Mining and Knowledge Discovery.

[18]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[19]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.