The Journal of Consumer Research at 40 : A Historical Analysis

the model might judge it to have considered how emotions impact decisions about family budgeting. The article would be classified as a mixture of both the emotional decision-making and family decision-making topics, with the precise mix determined by how heavily each abstract is weighted toward each topic. (3) The model alsois weighted toward each topic. (3) The model also creates an assignment of specific words from each featured topic to the article. For example, articles dealing with the topic methodological issues use the words validity, structure, and alternative with much greater frequency than articles that focus on other topics. The idea behind the probabilistic approach is that we imagine that articles were generated randomly from a hidden structure (Griffiths and Steyvers 2004). (Imagine consumer research being written by monkeys drawing words randomly from urns, with the urns as the hidden structure.) Our probabilistic topic model uses the Dirichlet distribution to estimate the probability of any given hidden structure having been used to generate an abstract with the words we see. The model thus uncovers the hidden structure most likely to have generated the data that we observe. Superior models are those that generate data relatively similar to the actual data. This approach utilizes the idea that each article is composed of a mixture of different topics, and each topic has certain words associated with it. For example, one topic has words such as social, identity, and group most closely associated with it. The topics are unnamed in the model, but after reviewing the words associated with each topic, we named each topic to ease the reader’s understanding. For example, our topic most associated with the words social, group, and identity, we named social identity and influence given the words most heavily associated with it. The number of topics judged to be represented in the hidden structure was found by minimizing perplexity. To do this our program splits the data into two subsets. The first subset is used to create a training model explaining the topic structure. The effectiveness of this model is then evaluated on the second subset of the data. We wish to minimize perplexity, which is the surprise that the training model registers. Surprise is the number of equally probable word choices in the evaluation subset of the data; this represents instances where our model is not confident of its prediction. The computed average of how surprised the model was by the words in the second half of the document is recorded, and the model with the lowest number of surprises was chosen. Our model identified the 16 topics shown in table 1 with their most representative terms and the name we have assigned to each topic. The model identifies words specifically associated with each topic. These are words with a much higher probability of occurring in articles related to that topic compared to their average chance of appearing across all the data. This allows us to ignore common words, for example, “the,” “a,” and “we,” which are widely used within all abstracts and so not specifically related to any topic. We detailed the 20 most representative words for each topic and ranked them from most to least representative. Several words were representative of more than one topic, leading to only 235 unique words across the 16 topics. For example, evaluation is found in the memory, contextual effects, and satisfying customers topics. Figure 2 shows the words associated with each topic. We can see the weight of representativeness of the word in relation to the topic by the size of the horizontal bar. The larger the bar, the more we should expect to see the word occurring in work on that topic. Thus search is highly representative of the search topic, and we should expect to see it regularly in research on the topic. Variety is moderately representative of the search topic, whereas ambiguity is only mildly representative of the topic.