Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating Ideas Using a Textual Dataset

Data mining project managers can benefit from using standard data mining process models. The benefits of using standard process models for data mining, such as the de facto and the most popular, Cross-Industry-Standard-Process model for Data Mining (CRISP-DM) are reduced cost and time. Also, standard models facilitate knowledge transfer, reuse of best practices, and minimize knowledge requirements. On the other hand, to unlock the potential of ever-growing textual data such as publications, patents, social media data, and documents of various forms, digital innovation is increasingly needed. Furthermore, the introduction of cutting-edge machine learning tools and techniques enable the elicitation of ideas. The processing of unstructured textual data to generate new and useful ideas is referred to as idea mining. Existing literature about idea mining merely overlooks the utilization of standard data mining process models. Therefore, the purpose of this paper is to propose a reusable model to generate ideas, CRISP-DM, for Idea Mining (CRISP-IM). The design and development of the CRISP-IM are done following the design science approach. The CRISP-IM facilitates idea generation, through the use of Dynamic Topic Modeling (DTM), unsupervised machine learning, and subsequent statistical analysis on a dataset of scholarly articles. The adapted CRISP-IM can be used to guide the process of identifying trends using scholarly literature datasets or temporally organized patent or any other textual dataset of any domain to elicit ideas. The ex-post evaluation of the CRISP-IM is left for future study.

[1]  Shu-Chen Kao,et al.  A creative idea exploration model: based on customer complaints , 2018, MISNC '18.

[2]  Paul Glasziou,et al.  Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module , 2015, Systematic Reviews.

[3]  Alta de Waal,et al.  Specializing CRISP-DM for Evidence Mining , 2007, IFIP Int. Conf. Digital Forensics.

[4]  S. M. García,et al.  2014: , 2020, A Party for Lazarus.

[5]  A. G. Asuero,et al.  The Correlation Coefficient: An Overview , 2006 .

[6]  Dean Keith Simonton,et al.  Cross-sectional time-series experiments: Some suggested statistical analyses , 1977 .

[7]  Paul Johannesson,et al.  Evaluating Open Data Innovation: A Measurement Model for Digital Innovation Contests , 2015, PACIS.

[8]  Nadia Steils,et al.  Creative contests: knowledge generation and underlying learning dynamics for idea generation , 2016 .

[9]  Amir-Mohsen Karimi-Majd,et al.  A new data mining methodology for generating new service ideas , 2015, Inf. Syst. E Bus. Manag..

[10]  R. G. Fichman,et al.  Digital Innovation as a Fundamental and Powerful Concept in the Information Systems Curriculum , 2014, MIS Q..

[11]  Oliver Gassmann,et al.  Management of the Fuzzy Front End of Innovation , 2014 .

[12]  W. Marsden I and J , 2012 .

[13]  George E. P. Box,et al.  Intervention Analysis with Applications to Economic and Environmental Problems , 1975 .

[14]  Eduard Alexandru Stoica,et al.  Mining Customer Feedback Documents , .

[15]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[16]  Hyerim Bae,et al.  A framework to discover potential ideas of new product development from crowdsourcing application , 2015, ArXiv.

[17]  Bingfeng Ge,et al.  Development trend forecasting for coherent light generator technology based on patent citation network analysis , 2017, Scientometrics.

[18]  Gonzalo Mariscal,et al.  A survey of data mining and knowledge discovery process models and methodologies , 2010, The Knowledge Engineering Review.

[19]  Eric Schoop,et al.  Idea Mining - Text Mining Supported Knowledge Management for Innovation Purposes , 2013, AMCIS.

[20]  So Young Sohn,et al.  Discovering emerging business ideas based on crowdfunded software projects , 2019, Decis. Support Syst..

[21]  Tyler J. Mulhearn,et al.  Taking the good with the bad: The impact of forecasting timing and valence on idea evaluation and creativity. , 2019 .

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  S. Hewitt,et al.  1977 , 1977, Kuwait 1975/76 - 2019.

[24]  Olegas Niaksu CRISP Data Mining Methodology Extension for Medical Domain , 2015 .

[25]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[26]  C. Elkan,et al.  Topic Models , 2008 .

[27]  Ramesh Sharda,et al.  Adapting CRISP-DM Process for Social Network Analytics: Application to Healthcare , 2015, AMCIS.

[28]  Nicolas Lachiche,et al.  CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories , 2021, IEEE Transactions on Knowledge and Data Engineering.

[29]  Greg A. Stevens,et al.  3,000 Raw Ideas = 1 Commercial Success! , 1997 .

[30]  Tormod Næs,et al.  How Good are Ideas Identified by an Automatic Idea Detection System? , 2018 .

[31]  J. Peña,et al.  Instantiation and adaptation of CRISP-DM to Bioinformatics computational processes , 2011 .

[32]  P. James Temporal patterns , 2018, Oxford Scholarship Online.

[33]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[34]  Dirk Thorleuchter,et al.  Mining ideas from textual information , 2010, Expert Syst. Appl..