A Semiparametric Gaussian Copula Regression Model for Predicting Financial Risks from Earnings Calls

Earnings call summarizes the financial performance of a company, and it is an important indicator of the future financial risks of the company. We quantitatively study how earnings calls are correlated with the financial risks, with a special focus on the financial crisis of 2009. In particular, we perform a text regression task: given the transcript of an earnings call, we predict the volatility of stock prices from the week after the call is made. We propose the use of copula: a powerful statistical framework that separately models the uniform marginals and their complex multivariate stochastic dependencies, while not requiring any prior assumptions on the distributions of the covariate and the dependent variable. By performing probability integral transform, our approach moves beyond the standard count-based bag-ofwords models in NLP, and improves previous work on text regression by incorporating the correlation among local features in the form of semiparametric Gaussian copula. In experiments, we show that our model significantly outperforms strong linear and non-linear discriminative baselines on three datasets under various settings.

[1]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[2]  Berthold Schweizer,et al.  Probabilistic Metric Spaces , 2011 .

[3]  Barnabás Póczos,et al.  Copula-based Kernel Dependency Measures , 2012, ICML.

[4]  Rebecca Hwa,et al.  Regression for Sentence-Level MT Evaluation with Pseudo References , 2007, ACL.

[5]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[6]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[7]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[8]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[9]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[10]  Noah A. Smith,et al.  Word Salad: Relating Food Prices and Descriptions , 2012, EMNLP.

[11]  Jennifer Neville,et al.  Collective inference for network data with copula latent markov networks , 2013, WSDM.

[12]  Chuan-Ju Wang,et al.  Financial Sentiment Analysis for Risk Prediction , 2013, IJCNLP.

[13]  Kevyn Collins-Thompson,et al.  Copulas for information retrieval , 2013, SIGIR.

[14]  C. Genest,et al.  Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask , 2007 .

[15]  M. Pitt,et al.  Efficient Bayesian inference for Gaussian copula regression models , 2006 .

[16]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[17]  Wai Lam,et al.  Stock prediction: Integrating text mining approach using real-time news , 2003, 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. Proceedings..

[18]  David R. Peterson,et al.  Earnings Conference Calls and Stock Returns: The Incremental Informativeness of Textual Tone , 2011 .

[19]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[20]  Stuart A. Klugman,et al.  Copula Regression , 2011 .

[21]  Xiaohong Chen,et al.  Estimation of Copula-Based Semiparametric Time Series Models , 2006 .

[22]  H. Joe Multivariate models and dependence concepts , 1998 .

[23]  P. Gloor,et al.  Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear” , 2011 .

[24]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Noah A. Smith,et al.  Probabilistic Frame-Semantic Parsing , 2010, NAACL.

[27]  Noah A. Smith,et al.  Predicting Risk from Financial Reports with Regression , 2009, NAACL.

[28]  Chuan-Ju Wang,et al.  Risk Ranking from Financial Reports , 2013, ECIR.

[29]  C. Varin,et al.  Gaussian Copula Marginal Regression , 2012 .

[30]  Noah A. Smith,et al.  Movie Reviews and Revenues: An Experiment in Text Regression , 2010, NAACL.

[31]  William Yang Wang,et al.  Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model , 2012, ACL.

[32]  Mirella Lapata,et al.  Automatic Evaluation of Information Ordering: Kendall’s Tau , 2006, CL.

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  Zoubin Ghahramani,et al.  Gaussian Process Vine Copulas for Multivariate Dependence , 2013, ICML.

[35]  Bryan Silverthorn,et al.  Spherical Topic Models , 2010, ICML.

[36]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[37]  Kenneth Ward Church,et al.  Poisson mixtures , 1995, Natural Language Engineering.

[38]  Belinda Crawford Camiciottoli Earnings calls: Exploring an emerging financial reporting genre: , 2010 .

[39]  Rebecca J. Passonneau,et al.  Semantic Frames to Predict Stock Price Movement , 2013, ACL.

[40]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[41]  Tuo Zhao,et al.  CODA: high dimensional copula discriminant analysis , 2013, J. Mach. Learn. Res..

[42]  Anthony S. Tay,et al.  Evaluating Density Forecasts , 1997 .

[43]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[44]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[45]  David Christensen,et al.  Fast algorithms for the calculation of Kendall’s τ , 2005, Comput. Stat..

[46]  Brendan T. O'Connor,et al.  Predicting a Scientific Community’s Response to an Article , 2011, EMNLP.