Corporate Social Responsibility Reports: Understanding Topics via Text Mining

This study utilizes Text Data Mining (TDM) to analyze the contents of Corporate Social Responsibility (CSR) Reports. The goal is to find evidence that environmental sustainability has become embedded in corporate policy and the core business discourse of seven organizations over 2004-2012. Results from supervised modeling techniques suggest embeddedness of environmental qualities in the business discourse. Unsupervised techniques provide additional support for embeddedness—as business topics tend to increasingly group with environmental ones. The process we outline should facilitate pattern discovery in documents, minimizing or eliminating the need for time-consuming content analysis that is frequently used in qualitative research. To our knowledge, this is one of the first attempts to apply TDM processing to analyze unstructured data from CSR reports.

[1]  A Web Analysis of Sustainability Reporting: An Oil and Gas Perspective , 2008 .

[2]  Frank Figge,et al.  What the Papers Say: Trends in Sustainability: A Comparative Analysis of 115 Leading National Newspapers Worldwide , 2009 .

[3]  Lu Tang Media discourse of corporate social responsibility in China: a content analysis of newspapers , 2012 .

[4]  Arno Scharl,et al.  Communicating sustainability: A web content analysis of North American, Asian and European firms , 2008 .

[5]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[7]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[8]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[10]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .

[11]  Martin Bichler,et al.  Design science in information systems research , 2006, Wirtschaftsinf..

[12]  Á. Moreno,et al.  Communicating CSR, citizenship and sustainability on the web , 2009 .

[13]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.

[14]  Kathleen M. Carley Coding Choices for Textual Analysis: A Comparison of Content Analysis and Map Analysis , 1993 .

[15]  Michael Norris,et al.  The Sustainability Accounting Standards Board , 2014 .

[16]  Galit Shmueli,et al.  Predictive Analytics in Information Systems Research , 2010, MIS Q..

[17]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[18]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..

[19]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[20]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[21]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[22]  Alan R. Hevner,et al.  POSITIONING AND PRESENTING DESIGN SCIENCE RESEARCH FOR MAXIMUM IMPACT 1 , 2013 .

[23]  Fillia Makedon,et al.  Using singular value decomposition approximation for collaborative filtering , 2005, Seventh IEEE International Conference on E-Commerce Technology (CEC'05).

[24]  Frank Figge,et al.  What the Papers Say: Trends in Sustainability: A Comparative Analysis of 115 Leading National Newspapers Worldwide , 2009 .

[25]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[26]  Jochen Dörre,et al.  Text mining: finding nuggets in mountains of textual data , 1999, KDD '99.

[27]  Juliet Roper,et al.  Corporate reports on sustainability and sustainable development: ‘We Have Arrived’ , 2014 .

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[30]  Carlos M. Parra Quality of Life Markets: Capabilities and Corporate Social Responsibility1 , 2008 .

[31]  Monica Chiarini Tremblay,et al.  Identifying fall-related injuries: Text mining the electronic medical record , 2009, Inf. Technol. Manag..

[32]  Chuong B Do,et al.  What is the expectation maximization algorithm? , 2008, Nature Biotechnology.