Pareto charting using multifield freestyle text data applied to Toyota Camry user reviews

This article proposes a method for Pareto charting that is based on unsupervised, freestyle text such as customer complaint, rework, scrap, or maintenance event descriptions. The proposed procedure is based on a slight extension of the latent Dirichlet allocation method to form multifield latent Dirichlet allocation. The extension is the usage of field-specific dictionaries for multifield databases and changes to recommended default prior settings. We use a numerical study to motivate the prior setting selection. A real-world case study associated with user reviews of Toyota Camry vehicles is used to illustrate the practical value of the proposed methods. The results indicate that only 4% of the words written by Consumer Reports reviewers from the last 10 years relate to the widely publicized unintended acceleration issue. Copyright © 2012 John Wiley & Sons, Ltd.

[1]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[2]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Daniel R. Jeske,et al.  Mining and Tracking Massive Text Data: Classification, Construction of Tracking Statistics, and Inference Under Misclassification , 2007, Technometrics.

[6]  Huseyin Cenk Ozmutlu Markovian analysis for automatic new topic identification in search engine transaction logs , 2009 .

[7]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[8]  Theodore T. Allen,et al.  Supersaturated Designs That Maximize the Probability of Identifying Active Factors , 2003, Technometrics.

[9]  Stelios Psarakis,et al.  Review of multinomial and multiattribute quality control charts , 2009, Qual. Reliab. Eng. Int..

[10]  Alexander J. Smola,et al.  Word Features for Latent Dirichlet Allocation , 2010, NIPS.

[11]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[12]  Theodore T. Allen,et al.  An alternative desirability function for achieving ‘six sigma’ quality , 2003 .

[13]  Yiming Yang,et al.  Multi-field Correlated Topic Modeling , 2009, SDM.

[14]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[15]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[16]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[17]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[18]  Roi Blanco,et al.  Probabilistic static pruning of inverted files , 2010, TOIS.

[19]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[20]  Theodore T. Allen,et al.  An experimental design criterion for minimizing meta‐model prediction errors applied to die casting process design , 2003 .