Goal attainment on long tail web sites: An information foraging approach

The long tail has attracted substantial theoretical as well as practical interest, yet there have been few empirical studies that have explicitly examined the factors that drive online conversions at these sites. This research tests several hypotheses derived from Information Foraging Theory (IFT) that pertain to goal achievement on long tail Web sites. IFT introduced concepts of information patches and information scent to model information seeking behavior of individuals, but has mostly been tested in production rule environments where the theory is used to simulate user behavior. Testing IFT-driven hypotheses on real data required learning information patches and scents using an inductive approach and in this paper we adapt existing algorithms for these discovery tasks. Our results based on clickstream data from forty-seven small business Web sites show both the existence of valuable information patches and information scent trails as well as their importance in explaining conversion on these sites. The majority of the hypotheses were supported and we discuss the implications of this for researchers and practitioners.

[1]  Anshu Aggarwal,et al.  HTTP: The Definitive Guide , 2002 .

[2]  Xue Bai,et al.  Predicting consumer sentiments from online text , 2011, Decis. Support Syst..

[3]  A. Kay APPLYING OPTIMAL FORAGING THEORY TO ASSESS NUTRIENT AVAILABILITY RATIOS FOR ANTS , 2002 .

[4]  John R. Anderson,et al.  Human memory: An adaptive perspective. , 1989 .

[5]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[6]  E. Charnov Optimal foraging, the marginal value theorem. , 1976, Theoretical population biology.

[8]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[9]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[10]  Sylvain Senecal,et al.  Predicting On-Line Task Completion with Clickstream Complexity Measures: A Graph-Based Approach , 2006, Int. J. Electron. Commer..

[11]  P. Blau Exchange and Power in Social Life , 1964 .

[12]  Amy Franklin,et al.  Information Foraging Behavior in a Trauma Emergency Department , 2012, AMIA.

[13]  Olivia R. Liu Sheng,et al.  Online Consumer Search Depth: Theories and New Findings , 2006, J. Manag. Inf. Syst..

[14]  Seif Haridi,et al.  What makes a Web site popular? , 2004, CACM.

[15]  Cornelia Boldyreff,et al.  The evolution of Websites , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[16]  Padmini Srinivasan,et al.  Predicting Web Page Status , 2008, Inf. Syst. Res..

[17]  John R. Anderson,et al.  A Theory of Sentence Memory as Part of A General Theory of Memory , 2001 .

[18]  Guy W. Mullarkey,et al.  Factors Affecting Web Site Visit Duration: A Cross-Domain Analysis , 2006 .

[19]  John E. McEneaney Graphic and numerical methods to assess navigation in hypertext , 2001, Int. J. Hum. Comput. Stud..

[20]  L. Cosmides,et al.  When and why do people avoid unknown probabilities in decisions under uncertainty? Testing some predictions from optimal foraging theory , 1999, Cognition.

[21]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[22]  Stephen J. Payne,et al.  Allocating Time Across Multiple Texts: Sampling and Satisficing , 2007, Hum. Comput. Interact..

[23]  Jian Zhang,et al.  Does Search Matter? Using Clickstream Data to Examine the Relationship between Online Search and Purchase Behavior , 2006, ICIS.

[24]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[25]  Osmar R. Zaïane,et al.  Contrasting the Contrast Sets: An Alternative Approach , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[27]  Xing Xie,et al.  Browsing large pictures under limited display sizes , 2006, IEEE Transactions on Multimedia.

[28]  Stephen Shaoyi Liao,et al.  Mining comparative opinions from customer reviews for Competitive Intelligence , 2011, Decis. Support Syst..

[29]  Wendy W. Moe,et al.  Capturing evolving visit behavior in clickstream data , 2004 .

[30]  H. Simon,et al.  Rational choice and the structure of the environment. , 1956, Psychological review.

[31]  R. Bucklin,et al.  Modeling Purchase Behavior at an E-Commerce Web Site: A Task-Completion Approach , 2004 .

[32]  Peter Pirolli,et al.  Life, death, and lawfulness on the electronic frontier , 1997, CHI.

[33]  Gerald L. Lohse,et al.  Cognitive Lock-In and the Power Law of Practice , 2003 .

[34]  John R. Anderson,et al.  Why do children learn to say “Broke”? A model of learning the past tense without feedback , 2002, Cognition.

[35]  Lada A. Adamic,et al.  The Web's hidden order , 2001, CACM.

[36]  K. Hawkes,et al.  why hunters gather: optimal foraging and the Aché of eastern Paraguay , 1982 .

[37]  H A Simon,et al.  How Big Is a Chunk? , 1974, Science.

[38]  W. Mendenhall,et al.  A Second Course in Statistics: Regression Analysis , 1996 .

[39]  Sunil Gupta,et al.  Choice and the Internet: From Clickstream to Research Stream , 2002 .

[40]  Sylvain Sénécal,et al.  Consumers' decision-making process and their online shopping behavior: a clickstream analysis , 2005 .

[41]  Peter Dalgaard,et al.  Introductory statistics with R , 2002, Statistics and computing.

[42]  Praveen Pathak,et al.  Ontological analysis of web surf history to maximize the click-through probability of web advertisements , 2009, Decis. Support Syst..

[43]  Ian Witten,et al.  Data Mining , 2000 .

[44]  M. HamidR.Jamali,et al.  Characterising and evaluating information seeking behaviour in a digital environment: Spotlight on the 'bouncer' , 2007, Inf. Process. Manag..

[45]  Julie Chen,et al.  The bloodhound project: automating discovery of web usability issues using the InfoScentπ simulator , 2003, CHI '03.

[46]  Christopher Olston,et al.  ScentTrails: Integrating browsing and searching on the Web , 2003, TCHI.

[47]  The Next Step in Database Marketing , 1998 .

[48]  Kannan Srinivasan,et al.  Modeling Online Browsing and Path Analysis Using Clickstream Data , 2004 .

[49]  J. Alonso,et al.  Patch use in cranes: a field test of optimal foraging predictions , 1995, Animal Behaviour.

[50]  P. Pirolli,et al.  Spread of activation. , 1984 .

[51]  Robert W. Reeder,et al.  Information scent as a driver of Web behavior graphs: results of a protocol analysis method for Web usability , 2001, CHI.

[52]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[53]  Zhiqiang Zheng,et al.  Personalization from incomplete data: what you don't know can hurt , 2001, KDD '01.

[54]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[55]  G. Zauberman The Intertemporal Dynamics of Consumer Lock-in , 2003 .

[56]  Balaji Padmanabhan,et al.  Segmenting customer transactions using a pattern-based clustering approach , 2003, Third IEEE International Conference on Data Mining.

[57]  Ben Shneiderman,et al.  Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[58]  Priscilla S. Markwood,et al.  The Long Tail: Why the Future of Business is Selling Less of More , 2006 .

[59]  Paolo Tonella,et al.  Web site analysis: structure and evolution , 2000, Proceedings 2000 International Conference on Software Maintenance.

[60]  M. Hitt The Long Tail: Why the Future of Business Is Selling Less of More , 2007 .

[61]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[62]  Glenn J. Browne,et al.  Stopping Behavior of Systems Analysts During Information Requirements Elicitation , 2004, J. Manag. Inf. Syst..

[63]  Paolo Tonella,et al.  Understanding and Restructuring Web Sites with ReWeb , 2001, IEEE Multim..

[64]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[65]  Robin Burke,et al.  Context-aware music recommendation based on latenttopic sequential patterns , 2012, RecSys.

[66]  Peter S. Fader,et al.  Dynamic Conversion Behavior at E-Commerce Sites , 2004, Manag. Sci..

[67]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[68]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[69]  Terrence C. Stewart,et al.  Deconstructing and reconstructing ACT-R: Exploring the architectural space , 2007, Cognitive Systems Research.

[70]  Glenn J. Browne,et al.  Cognitive Stopping Rules for Terminating Information Search in Online Tasks , 2007, MIS Q..

[71]  R. Jaeschke,et al.  A readers' guide to the interpretation of diagnostic test properties: clinical example of sepsis , 2003, Intensive Care Medicine.

[72]  G. A. Miller The magical number seven plus or minus two: some limits on our capacity for processing information. , 1956, Psychological review.

[73]  Catarina Sismeiro,et al.  A Model of Web Site Browsing Behavior Estimated on Clickstream Data , 2003 .

[74]  Young-Hoon Park,et al.  Modeling Browsing Behavior at Multiple Websites , 2004 .

[75]  P. Chatterjee,et al.  Modeling the Clickstream: Implications for Web-Based Advertising Efforts , 2003 .

[76]  Wendy W. Moe,et al.  The Influence of Goal‐Directed and Experiential Activities on Online Flow Experiences , 2003 .

[77]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[78]  Bob Stone,et al.  Successful Direct Marketing Methods , 1975 .

[79]  Wayne D. Gray,et al.  Adapting to the task environment: Explorations in expected value , 2005, Cognitive Systems Research.

[80]  Dennis F. Galletta,et al.  When the Wait Isn't So Bad: The Interacting Effects of Website Delay, Familiarity, and Breadth , 2006, Inf. Syst. Res..

[81]  W. G. Cochran Some Methods for Strengthening the Common χ 2 Tests , 1954 .

[82]  John R. Anderson,et al.  Toward a real-time model-based training system , 2006, Interact. Comput..

[83]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[84]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[85]  Ke Wang,et al.  Building Association-Rule Based Sequential Classifiers for Web-Document Prediction , 2004, Data Mining and Knowledge Discovery.

[86]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[87]  Sanjit Sengupta,et al.  Measuring Users' Web Activity to Evaluate and Enhance Advertising Effectiveness , 2002 .

[88]  J. Rowley Product search in e‐shopping: a review and research propositions , 2000 .

[89]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[90]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[91]  Rachel K. E. Bellamy,et al.  Scents in Programs:Does Information Foraging Theory Apply to Program Maintenance? , 2007 .

[92]  Brian D. Davison Topical locality in the Web , 2000, SIGIR '00.

[93]  Daniel L. Sherrell,et al.  Consumer Search: An Extended Framework , 1986 .

[94]  Michael D. Byrne,et al.  ACT-R/PM and menu selection: applying a cognitive architecture to HCI , 2001, Int. J. Hum. Comput. Stud..

[95]  Shizuhiko Nishisato,et al.  Elements of Dual Scaling: An Introduction To Practical Data Analysis , 1993 .

[96]  Peter S. Fader,et al.  On the Depth and Dynamics of Online Search Behavior , 2004, Manag. Sci..

[97]  Frank Harary,et al.  Status and Contrastatus , 1959 .

[98]  Dirk Van den Poel,et al.  Predicting online-purchasing behaviour , 2005, Eur. J. Oper. Res..

[99]  R. Macarthur,et al.  On Optimal Use of a Patchy Environment , 1966, The American Naturalist.

[100]  P. Sandstrom An Optimal Foraging Approach to Information Seeking and Use , 1994, The Library Quarterly.

[101]  J. Maccracken,et al.  Coyote feeding strategies in Southeastern Idaho: optimal foraging by an opportunistic predator? , 1987 .

[102]  Jeffrey Heer,et al.  Scented Widgets: Improving Navigation Cues with Embedded Visualizations , 2007, IEEE Transactions on Visualization and Computer Graphics.

[103]  Peter S. Fader,et al.  RFM and CLV: Using Iso-Value Curves for Customer Base Analysis , 2005 .

[104]  John R Anderson,et al.  An integrated theory of the mind. , 2004, Psychological review.

[105]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[106]  Michael D. Byrne,et al.  Effects of scent and breadth on use of site-specific search on e-commerce Web sites , 2003, TCHI.

[107]  Pamela Effrein Sandstrom,et al.  Information Foraging Theory: Adaptive Interaction with Information , 2010, J. Assoc. Inf. Sci. Technol..

[108]  Jock D. Mackinlay,et al.  Visualizing the evolution of Web ecologies , 1998, CHI.

[109]  G. Nevitt,et al.  Olfactory foraging by Antarctic procellariiform seabirds: life at high Reynolds numbers. , 2000, The Biological bulletin.

[110]  Peter Pirolli,et al.  Information Foraging , 2009, Encyclopedia of Database Systems.

[111]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[112]  I. Ajzen The theory of planned behavior , 1991 .

[113]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[114]  C. Borror Practical Nonparametric Statistics, 3rd Ed. , 2001 .

[115]  H. Jaillet Web metrics: measuring patterns in online shopping , 2003 .

[116]  D. G. Morrison On the Interpretation of Discriminant Analysis , 1969 .

[117]  Raymond Hames,et al.  Optimal diet breadth theory as a model to explain variability in Amazonian hunting , 1982 .

[118]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[119]  Eric L. Charnov,et al.  Optimal Foraging: Some Theoretical Explorations , 2006 .

[120]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.