Topic Modeling and Transfer Learning for Automated Surveillance of Injury Reports in Consumer Product Reviews

Many modern firms and interest groups are tasked with the challenge of monitoring the status and performance of a bevy of distinct products. As online user-generated content has increased in volume, new unstructured data sources are available for mining unique insights. Reports of injuries arising as a result of product usage are particularly concerning. In this paper, we utilize complimentary approaches to address this problem. We analyze two novel datasets; first, a government-maintained dataset of hazard and injury reports and second, a large dataset of cross-industry consumer product reviews manually coded for the presence of hazard and injury reports. We apply an unsupervised topic modeling approach to characterize the hazard and injury reports detected. Then, we implement a supervised transfer learning technique, using information obtained from the governmentmaintained dataset to detect hazard and injury reports in online reviews. Our results offer improved surveillance for monitoring hazards across multiple industries.

[1]  F. Robert Jacobs,et al.  Impact of product recall announcements on retailers׳ financial value , 2014 .

[2]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[3]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[4]  Alan S. Abrahams,et al.  A Tabu search heuristic for smoke term curation in safety defect discovery , 2018, Decis. Support Syst..

[5]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[6]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[7]  Paul A. Pavlou,et al.  Can online reviews reveal a product's true quality?: empirical findings and analytical modeling of Online word-of-mouth communication , 2006, EC '06.

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Weiguo Fan,et al.  Vehicle defect discovery from social media , 2012, Decis. Support Syst..

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[12]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[13]  Alan S. Abrahams,et al.  Social media analytics for quality surveillance and safety hazard detection in baby cribs , 2018 .

[14]  Daniel Neagu,et al.  Social media analysis for product safety using text mining and sentiment analysis , 2014, 2014 14th UK Workshop on Computational Intelligence (UKCI).

[15]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[16]  Foster J. Provost,et al.  Machine learning for targeted display advertising: transfer learning in action , 2013, Machine Learning.

[17]  Jan vom Brocke,et al.  Text Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial , 2016, Commun. Assoc. Inf. Syst..

[18]  Weiguo Fan,et al.  Effective profiling of consumer information retrieval needs: a unified framework and empirical comparison , 2005, Decis. Support Syst..

[19]  Weiguo Fan,et al.  What's buzzing in the blizzard of buzz? Automotive component isolation in social media postings , 2013, Decis. Support Syst..

[20]  Yong Liu,et al.  Does a Firm's Product-Recall Strategy Affect Its Financial Value? An Examination of Strategic Alternatives during Product-Harm Crises , 2009 .

[21]  Nicholas G. Rupp The Attributes of a Costly Recall: Evidence from the Automotive Industry , 2004 .

[22]  Aleda V. Roth,et al.  Safety hazard and time to recall: The role of recall strategy, product defect type, and supply chain player in the U.S. toy industry , 2011 .

[23]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[24]  Long Xia,et al.  Transfer Learning in Knowledge-Intensive Tasks: A Test in Healthcare Text Analytics , 2019, AMCIS.