Introduction to Data Analytics and Data Mining for Social Media Minitrack

Taking the data deluge from social media, in its many forms, and transforming it into useful information, or knowledge, is the essence of this minitrack. The papers at HICSS in 2015 remind our attendees and readers of the many real-world applications of data analytics and data mining for social media. Last year, for example, the papers showed how to aggregate data to discover the location of tweeters as well as how source code and prose can be combined to yield more valuable Stack Overflow posts. In 2013, our papers explored conducting a social outreach campaign to help prevent child abuse, and also analyzing travel patterns based on geo-tagged photographs. This minitrack begins with “A meta-analysis of theories and topics in social media research,” by Van Osch and Coursaris. This study provides an overview of the identity and intellectual core of social media research with respect to its prevailing theory-related practices. As such, this study gives us an important opportunity to pause and reflect on what has been accomplished by social media scholars to date. The findings provide a benchmark for tracking the state of the social media domain, while focusing our attention on topics and theories requiring further inquiry. The second paper explores if Twitter trends can predict election results based on evidence from the 2014 general election in India. Khatua, Khatua, Ghosh and Chaki show that that tweet volume and sentiment analysis can predict election results. This author team also finds that sentiment scoring can predict changes in voter share, even during an election. Finally, this study reaffirms the need for contextual understanding in forecasting election results. The third paper, “TreeQueST: A treemap-based query sandbox for microdocument retrieval,” is by Thom and Ertl. This University of Stuttgart team shows that the rise of online microdocument platforms, such as Twitter, has brought new relevance to techniques for finding and understanding information about recent events. The authors present and evaluate an approach for doing just this. Their novel approach builds on hierarchical topic clustering combined with a treemap-based visualization. “Viewer engagement in movie trailers and box office revenue,” by Oh, Ahn and Baek examines the antecedents of sharing movie trailers. Furthermore, this study assesses the impact of the consumption of and commentary on movie trailers on revenue. This work also highlights the practical challenges in attributing changes in box office revenue to consumer engagement on social media. The fifth paper, “Identifying uptake, sessions, and key actors in a socio-technical network,” is by Suthers and Dwyer. These authors address two challenges arising from the nature of participation in socio-technical networks. First, since learning and knowledge production takes place through both individual and collective agency, it is necessary to understand aggregate phenomenon (e.g., “ties”, “roles”, and “communities”) as both produced by and providing the setting for interactional events. Second, participant interaction is distributed across media, places and time, often resulting in separate traces of interaction that fragment their unitary experience. “Measuring NBA players’ mood by mining athlete-generated content,” by Xu and Yu, proposes a framework based on athlete generated content and assesses the relationship between an athlete’s mood and their individual performance. At the core of this framework is the use of Twitter as a source of intelligence and the choice of sentiment analysis to mine players’ moods from their tweets. This research also demonstrates the utility and limitations of mining user-generated unstructured text to predict outcomes such as individual performance. Our final paper is “Estimating centrality statistics for complete and sampled networks: Some approaches and complications” by Lee and Pfeffer. This Carnegie Mellon author team presents some approximation techniques exploiting any tractable relationship between the measures and network characteristics such as size and density. They find there exist distinct functional relationships between network statistics of complex “slow” measures and “fast” measures, such as the linkage between betweenness centrality and network density. Furthermore, they track how these relationships scale with network size. 2015 48th Hawaii International Conference on System Sciences