Top-k queries over web applications

The core logic of web applications that suggest some particular service, such as online shopping, e-commerce etc., is typically captured by Business Processes (BPs). Among all the (maybe infinitely many) possible execution flows of a BP, analysts are often interested in identifying flows that are “most important”, according to some weight metric. The goal of the present paper is to provide efficient algorithms for top-k query evaluation over the possible executions of Business Processes, under some given weight function. Unique difficulties in top-k analysis in this settings stem from (1) the fact that the number of possible execution flows of a given BP is typically very large, or even infinite in presence of recursion and (2) that the weights (e.g., likelihood, monetary cost, etc.) induced by actions performed during the execution (e.g., product purchase) may be inter-dependent (due to probabilistic dependencies, combined discount deals etc.). We exemplify these difficulties, and overcome them to provide efficient algorithms for query evaluation where possible. We also describe in details an application prototype that we have developed for recommending optimal navigation in an online shopping web site that is based on our model and algorithms.

[1]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[2]  Nick Koudas,et al.  Data stream query processing , 2003, Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003..

[3]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[4]  Daniel Deutch,et al.  Goal-Oriented Web-site Navigation for On-line Shoppers , 2009, Proc. VLDB Endow..

[5]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[6]  Rina Dechter,et al.  Generalized best-first search strategies and the optimality of A* , 1985, JACM.

[7]  Kousha Etessami,et al.  Algorithmic Verification of Recursive Probabilistic State Machines , 2005, TACAS.

[8]  Pedro M. Domingos,et al.  Dynamic Probabilistic Relational Models , 2003, IJCAI.

[9]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Daniel Deutch,et al.  TOP-K projection queries for probabilistic business processes , 2009, ICDT '09.

[11]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[12]  Yehoshua Sagiv,et al.  Matching Twigs in Probabilistic XML , 2007, VLDB.

[13]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[14]  Evgeny Kharlamov,et al.  Probabilistic XML via Markov Chains , 2010, Proc. VLDB Endow..

[15]  Daniel Deutch,et al.  Evaluating TOP-K Queries over Business Processes , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Kousha Etessami,et al.  Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations , 2005, JACM.

[17]  Matjaz B. Juric,et al.  Business process execution language for web services , 2004 .

[18]  Serge Abiteboul,et al.  Querying and Updating Probabilistic Information in XML , 2006, EDBT.

[19]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[20]  Robert E. Tarjan,et al.  Bounds on Backtrack Algorithms for Listing Cycles, Paths, and Spanning Trees , 1975, Networks.

[21]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Daniel Deutch,et al.  Optimal top-k query evaluation for weighted business processes , 2010, Proc. VLDB Endow..

[23]  David Eppstein,et al.  Finding the k shortest paths , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[24]  Jaroslav Nesetril,et al.  Tree-depth, subgraph coloring and homomorphism bounds , 2006, Eur. J. Comb..

[25]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[26]  Sara Cohen,et al.  Querying parse trees of stochastic context-free grammars , 2010, ICDT '10.

[27]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[28]  Peter Pirolli,et al.  Distributions of surfers' paths through the World Wide Web: Empirical characterizations , 1999, World Wide Web.

[29]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[30]  Patrick Valduriez,et al.  Best Position Algorithms for Top-k Queries , 2007, VLDB.

[31]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[32]  Tim Oates,et al.  Estimating Maximum Likelihood Parameters for Stochastic Context-Free Graph Grammars , 2003, ILP.

[33]  T. Capers Jones,et al.  Estimating software costs , 1998 .

[34]  Catriel Beeri,et al.  Querying business processes , 2006, VLDB.

[35]  Divesh Srivastava,et al.  Data Stream Query Processing: A Tutorial , 2003, VLDB.

[36]  Daniel Deutch,et al.  Type inference and type checking for queries on execution traces , 2008, Proc. VLDB Endow..

[37]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.