Mining online community data: The nature of ideas in online communities

Abstract Ideas are essential for innovation and for the continuous renewal of a firm’s product offerings. Previous research has argued that online communities contain such ideas. Therefore, online communities such as forums, Facebook groups, blogs etc. are potential gold mines for innovative ideas that can be used for boosting the innovation performance of the firm. However, the nature of online community data makes idea detection labor intensive. As an answer to this problem, research has shown that it might be possible to detect ideas from online communities, automatically. Research is however, yet to provide an answer to what is it that makes such automatic idea detection possible? Our study is based on two datasets from dialogue between members of two distinct online communities. The first community is related to beer. The second is related to Lego. We generate machine learning classifiers based on Support Vector Machines and Partial Least Squares that can detect ideas from each respective online community. We use partial least squares to investigate what are the words and expressions that allows for automatic classification of ideas. We conclude that ideas from the two online communities, contains suggestion/solution words and expressions and it is these that make automatic idea detection possible. In addition we conclude that the nature of the ideas in the beer community seems to be related to the brewing process. The nature of the ideas in the Lego community seems to be related to new products that consumers would want.

[1]  J. Scholderer,et al.  In Search of New Product Ideas: Identifying Ideas in Online Communities by Machine Learning and Text Mining , 2017 .

[2]  J. Füller,et al.  Innovation creation by online basketball communities , 2007 .

[3]  Arjen P. de Vries,et al.  Increasing cheat robustness of crowdsourcing tasks , 2013, Information Retrieval.

[4]  A. Gustafsson,et al.  Harnessing the Creative Potential among Users , 2004 .

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[7]  Sonali K. Shah,et al.  How Communities Support Innovative Activities: An Exploration of Assistance and Sharing Among End-Users , 2003 .

[8]  Torsten Oliver Salge,et al.  Mapping the Topic Landscape of JPIM , 1984-2013: In Search of Hidden Structures and Development Trajectories , 2016 .

[9]  Michael J. A. Berry,et al.  Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , 2004 .

[10]  Thomas Lee Rodgers,et al.  Identifying Quality, Novel, and Creative Ideas: Constructs and Scales for Idea Evaluation , 2006, J. Assoc. Inf. Syst..

[11]  E. Hippel Innovation by User Communities: Learning From Open-Source Software , 2001 .

[12]  Klemens Böhm,et al.  High-throughput crowdsourcing mechanisms for complex tasks , 2013, Social Network Analysis and Mining.

[13]  Tormod Næs,et al.  A user-friendly guide to multivariate calibration and classification , 2002 .

[14]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[15]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[16]  S. Wold,et al.  Partial least squares analysis with cross‐validation for the two‐class problem: A Monte Carlo study , 1987 .

[17]  Lars Frederiksen,et al.  The Front End of Innovation: Organizing Search for Ideas , 2015 .

[18]  Lars Frederiksen,et al.  Why Do Users Contribute to Firm-Hosted User Communities? The Case of Computer-Controlled Music Instruments , 2006, Organ. Sci..

[19]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[20]  Paul Michael Di Gangi,et al.  Getting Customers' Ideas to Work for You: Learning from Dell how to Succeed with Online User Innovation Communities , 2010, MIS Q. Executive.

[21]  Robert E. Cole,et al.  From a Firm-Based to a Community-Based Model of Knowledge Creation: The Case of the Linux Kernel Development , 2003, Organ. Sci..

[22]  Fu-Ren Lin,et al.  Discovering genres of online discussion threads via text mining , 2009, Comput. Educ..

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[24]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[25]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[26]  Erik Wästlund,et al.  Exploring Users' Appropriateness as a Proxy for Experts When Screening New Product/Service Ideas† , 2016 .

[27]  Peter R. Magnusson,et al.  Exploring the Contributions of Involving Ordinary Users in Ideation of Technology‐Based Services* , 2009 .

[28]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[29]  Martin Schreier,et al.  The Value of Crowdsourcing: Can Users Really Compete with Professionals in Generating New Product Ideas? , 2009 .

[30]  Lutgarde M. C. Buydens,et al.  Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC) , 2014 .

[31]  Jason Weston,et al.  A user's guide to support vector machines. , 2010, Methods in molecular biology.

[32]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[33]  G. Tellis,et al.  Mining Marketing Meaning from Online Chatter: Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation , 2014 .

[34]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[35]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[36]  A. Zanasi Text Mining and its Applications to Intelligence, CRM and Knowledge Management , 2007 .

[37]  Kurt Hornik,et al.  Text Mining Infrastructure in R , 2008 .

[38]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[39]  Min-Yen Kan,et al.  Perspectives on crowdsourcing annotations for natural language processing , 2012, Language Resources and Evaluation.