Statistical inferences for polarity identification in natural language

Information forms the basis for all human behavior, including the ubiquitous decision-making that people constantly perform in their every day lives. It is thus the mission of researchers to understand how humans process information to reach decisions. In order to facilitate this task, this work proposes LASSO regularization as a statistical tool to extract decisive words from textual content in order to study the reception of granular expressions in natural language. This differs from the usual use of the LASSO as a predictive model and, instead, yields highly interpretable statistical inferences between the occurrences of words and an outcome variable. Accordingly, the method suggests direct implications for the social sciences: it serves as a statistical procedure for generating domain-specific dictionaries as opposed to frequently employed heuristics. In addition, researchers can now identify text segments and word choices that are statistically decisive to authors or readers and, based on this knowledge, test hypotheses from behavioral research.

[1]  Xiaoquan Zhang,et al.  Impact of Wikipedia on Market Information Environment: Evidence on Management Disclosure and Investor Reaction , 2013, MIS Q..

[2]  Kate Sweeny,et al.  Do You Want the Good News or the Bad News First? The Nature and Consequences of News Order Preferences , 2014, Personality & social psychology bulletin.

[3]  Hal R. Varian,et al.  Big Data: New Tricks for Econometrics , 2014 .

[4]  Paul P. Tallon,et al.  The Information Artifact in IT Governance: Toward a Theory of Information Governance , 2013, J. Manag. Inf. Syst..

[5]  Hsinchun Chen,et al.  Business Intelligence and Analytics: Research Directions , 2013, TMIS.

[6]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[7]  Paulo Cortez,et al.  Stock market sentiment lexicon acquisition using microblogging data and statistical measures , 2016, Decis. Support Syst..

[8]  Vasant Dhar,et al.  Editorial - Big Data, Data Science, and Analytics: The Opportunity and Challenge for IS Research , 2014, Inf. Syst. Res..

[9]  Christopher D. Manning,et al.  Advances in natural language processing , 2015, Science.

[10]  Ramesh Sharda,et al.  Business Intelligence and Analytics , 2015 .

[11]  Christopher S. G. Khoo,et al.  Aspect-based sentiment analysis of movie reviews on discussion boards , 2010, J. Inf. Sci..

[12]  Jackie Rees Ulmer,et al.  Competing for Attention: An Empirical Study of Online Reviewers' Strategic Behavior , 2015, MIS Q..

[13]  Matt Taddy,et al.  Distributed multinomial regression , 2013, 1311.6139.

[14]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[15]  Stefan Stieglitz,et al.  Emotions and Information Diffusion in Social Media—Sentiment of Microblogs and Sharing Behavior , 2013, J. Manag. Inf. Syst..

[16]  K. Vohs,et al.  Case Western Reserve University , 1990 .

[17]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Maarten Sap,et al.  DLATK: Differential Language Analysis ToolKit , 2017, EMNLP.

[20]  Ronald Bosman,et al.  The “Tone Effect” of News on Investor Beliefs: An Experimental Approach , 2015 .

[21]  Khim-Yong Goh,et al.  Social Media Brand Community and Consumer Behavior: Quantifying the Relative Impact of User- and Marketer-Generated Content , 2013, Inf. Syst. Res..

[22]  Matt Taddy,et al.  Multinomial Inverse Regression for Text Analysis , 2010, 1012.2098.

[23]  James W. Pennebaker,et al.  Linguistic Bases of Social Perception , 1997 .

[24]  Claire Cardie,et al.  39. Opinion mining and sentiment analysis , 2014 .

[25]  Noah A. Smith,et al.  Movie Reviews and Revenues: An Experiment in Text Regression , 2010, NAACL.

[26]  Sofus A. Macskassy,et al.  More than Words: Quantifying Language to Measure Firms' Fundamentals the Authors Are Grateful for Assiduous Research Assistance from Jie Cao and Shuming Liu. We Appreciate Helpful Comments From , 2007 .

[27]  Christie M. Fuller,et al.  An Examination and Validation of Linguistic Constructs for Studying High-Stakes Deception , 2013 .

[28]  Fred D. Davis,et al.  NeuroIS: The Potential of Cognitive Neuroscience for Information Systems Research , 2008, ICIS.

[29]  Tim Loughran,et al.  Textual Analysis in Accounting and Finance: A Survey: TEXTUAL ANALYSIS IN ACCOUNTING AND FINANCE , 2016 .

[30]  Stefan Feuerriegel,et al.  Generating Domain-Specific Dictionaries using Bayesian Learning , 2015, ECIS.

[31]  Matthew L. Jensen,et al.  Credibility of Anonymous Online Product Reviews: A Language Expectancy Perspective , 2013, J. Manag. Inf. Syst..

[32]  Panagiotis G. Ipeirotis,et al.  Show me the money!: deriving the pricing power of product features by mining consumer reviews , 2007, KDD '07.

[33]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[34]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[35]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[36]  Roger H. L. Chiang,et al.  Big Data Research in Information Systems: Toward an Inclusive Research Agenda , 2016, J. Assoc. Inf. Syst..

[37]  Mary Ellen Carter,et al.  The relevance of Form 8-K reports , 1999 .

[38]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[39]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[40]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[41]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[42]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[43]  K. Ruyter,et al.  Unveiling What Is Written in the Stars: Analyzing Explicit, Implicit, and Discourse Patterns of Sentiment in Social Media , 2017 .

[44]  E. Henry Are Investors Influenced By How Earnings Press Releases Are Written? , 2006 .

[45]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[46]  Rolph E. Anderson,et al.  Technical Wording in Advertising: Implications for Market Segmentation , 1980 .

[47]  Di Wu,et al.  Word Power: A New Approach for Content Analysis , 2013 .

[48]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[49]  Dirk Neumann,et al.  Information Processing in Electronic Markets: Measuring Subjective Interpretation Using Sentiment Analysis , 2012, ICIS.

[50]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[51]  Bill McDonald,et al.  Textual Analysis in Accounting and Finance: A Survey , 2016 .