Text mining in insurance: from unstructured data to meaning

Every day insurance companies collect an enormous quantity of text data from multiple sources. By exploiting Natural Language Processing, we present a strategy to make beneficial use of the large information available in documents. After a brief review of the basics of text mining, we describe a case study where, by analyzing the accident narratives written by the researchers of the National Highway Traffic Safety Administration (NHTSA) of the U. S. Department of Transportation, we aim at grasping latent information useful to fine-tune policy premiums. The process is based on two steps. First, we classify the reports according to the relevance of their content to find the risk profile of the people involved. Next we use these profiles to add new latent risk covariates for the ratemaking process of the customers of a company.

[1]  Chris Fox,et al.  The Handbook of Computational Linguistics and Natural Language Processing , 2010 .

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Anurag Bhardwaj,et al.  Deep Learning Essentials , 2018 .

[6]  Leo Guelman,et al.  Gradient boosting trees for auto insurance loss cost modeling and prediction , 2012, Expert Syst. Appl..

[7]  Stefano Maria Iacus,et al.  iSA: A fast, scalable and accurate algorithm for sentiment analysis of social media content , 2016, Inf. Sci..

[8]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[9]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[10]  Ronen Feldman,et al.  The Text Mining Handbook: Text Mining Applications , 2006 .

[11]  Nancy Stout Analysis of narrative text fields in occupational injury data , 1998 .

[12]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[13]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[14]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.