Latent Dirichlet Allocation (LDA) for Topic Modeling of the CFPB Consumer Complaints

Abstract The Consumer Financial Protection Bureau (CFPB), created by congress in 2011, receives and processes consumer complaints pertaining to various financial services. Every complaint narrative provides insight into problems that consumers are experiencing. With increasing number of the CFPB complaint narratives, manual review of these documents by human experts is not feasible. This requires an intelligent system to analyze narratives automatically and provide insightful knowledge to the experts. In this paper, we propose an intelligent approach based on latent Dirichlet allocation (LDA) to analyze the CFPB consumer complaints. The proposed approach aims to extract latent topics in the CFPB complaint narratives, and explores their associated trends over time. The time trends will then be used to evaluate the effectiveness of the CFPB regulations and expectations on financial institutions in creating a consumer oriented culture. The technology-human partnership between the proposed approach and the CFPB experts could certainly improve consumer experience by providing more efficient and effective investigations of consumer complaint narratives.

[1]  Kaveh Bastani,et al.  The Utility of Hierarchical Dirichlet Process for Relationship Detection of Latent Constructs , 2017 .

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  David G. Rand,et al.  Structural Topic Models for Open‐Ended Survey Responses , 2014, American Journal of Political Science.

[4]  Kuan C. Chen,et al.  Text Mining e-Complaints Data From e-Auction Store With Implications For Internet Marketing Research , 2011 .

[5]  Angela K. Littwin,et al.  Examination as a Method of Consumer Protection , 2015 .

[6]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Ian Ayres,et al.  Skeletons in the Database: An Early Analysis of the CFPB's Consumer Complaints , 2013 .

[9]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[10]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[11]  Devi R. Gnyawali,et al.  In Search of Precision in Absorptive Capacity Research: A Synthesis of the Literature and Consolidation of Findings , 2018 .

[12]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Mark R. Lehto,et al.  Text Analysis of Consumer Reviews: The Case of Virtual Travel Firms , 2007, HCI.

[15]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[16]  F. Okumus,et al.  Understanding Satisfied and Dissatisfied Hotel Customers: Text Mining of Online Hotel Reviews , 2016 .

[17]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[18]  Roy Rada,et al.  Machine learning - applications in expert systems and information retrieval , 1986, Ellis Horwood series in artificial intelligence.

[19]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Pei Pei Tan,et al.  Gaining customer knowledge in low cost airlines through text mining , 2014, Ind. Manag. Data Syst..

[21]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[22]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[23]  C. Elkan,et al.  Topic Models , 2008 .

[24]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[25]  Keyvan Vakili,et al.  The double-edged sword of recombination in breakthrough innovation , 2013 .

[26]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[27]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[28]  Anthony D. McDonald,et al.  Text Mining to Decipher Free-Response Consumer Complaints , 2014, Hum. Factors.