Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL

Abstract In the process of funding research projects, two important factors must be studied. First, experts judges the potential value of a project. Secondly, the research ability is judged by the applicants previous research activity. The most appropriate way to assign the appropriate amount of money to project proposals is always a difficult decision. This work focuses on the second factor based on classifying the researchers previous research activity on an automated logical classification (accepted, rejected) resolving conflicts of interests between administration and applicants and helping in the decision-making process. As the class in these kinds of studies is usually unbalanced, because there are fewer accepted projects than rejected projects, how the use of an imbalanced dataset or a balanced dataset affects to the models is investigated by using several resampling methods. Later, several trees and rule-based machine learning techniques are used to create classification models. This is based on information from the faculty members information of the “Technical Particular University of Loja (UTPL),” in cases, with balanced datasets and those with unbalanced datasets. Multivariate analysis, feature selection, algorithm parameter tuning and validation methods are used to achieve robust classification models. The most accurate results are obtained with a rules-based model and use of the C5.0 algorithm. As the latter provides acceptable accuracy, close to 95 % when predicting both classes and to 99 % when predicting the accepted projects class, both the methodology and final model are validated.

[1]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[2]  J. King A review of bibliometric and other science indicators and their role in research evaluation , 1987, J. Inf. Sci..

[3]  Peter van den Besselaar,et al.  Funding, evaluation, and the performance of national research systems , 2018, J. Informetrics.

[4]  Louise Potvin,et al.  Priority-setting in public health research funding organisations: an exploratory qualitative study among five high-profile funders , 2018, Health Research Policy and Systems.

[5]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[6]  Olle Persson,et al.  Field normalized citation rates, field normalized journal impact and Norwegian weights for allocation of university research funds , 2012, Scientometrics.

[7]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[8]  Boaz Golany,et al.  Optimal Allocation of Proposals to Reviewers to Facilitate Effective Ranking , 2005, Manag. Sci..

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  M. Reed,et al.  The politics of research impact: academic perceptions of the implications for research funding, motivation and quality , 2018 .

[12]  Fuad Aleskerov,et al.  Ranking journals using social choice theory methods: A novel approach in bibliometrics , 2018, J. Informetrics.

[13]  István Szabó,et al.  Research funding: past performance is a stronger predictor of future scientific output than reviewer scores , 2020, J. Informetrics.

[14]  Manfred Fischedick,et al.  A Decision Support System for Public Funding of Experimental Development in Energy Research , 2018 .

[15]  Francisco Herrera,et al.  Imbalance: Oversampling algorithms for imbalanced classification in R , 2018, Knowl. Based Syst..

[16]  C. Lee Giles,et al.  Disambiguating authors in academic publications using random forests , 2009, JCDL '09.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Nicola Torelli,et al.  ROSE: a Package for Binary Imbalanced Learning , 2014, R J..

[19]  Rudolf R. Sinkovics,et al.  Towards a Consolidation of Worldwide Journal Rankings — A Classification Using Random Forests and Aggregate Rating via Data Envelopment Analysis , 2014 .

[20]  Young Wook Seo,et al.  Decision quality of the research project evaluation mechanism by using particle swarm optimization , 2017 .

[21]  J. R. Quinlan Induction of decision trees , 2004, Machine Learning.

[22]  Andrea Schiffauerova,et al.  How to boost scientific production? A statistical analysis of research funding and other influencing factors , 2016, Scientometrics.

[23]  Jiangyuan Yao,et al.  Applying Feature-Weighted Gradient Decent K-Nearest Neighbor to Select Promising Projects for Scientific Funding , 2020, Computers, Materials & Continua.

[24]  Haitham W Tuffaha,et al.  Directing research funds to the right research projects: a review of criteria used by research organisations in Australia in prioritising health research projects for funding , 2018, BMJ Open.

[25]  Cesar H. Comin,et al.  Clustering algorithms: A comparative approach , 2016, PloS one.

[26]  Jian Ma,et al.  A context-aware researcher recommendation system for university-industry collaboration on R&D projects , 2017, Decis. Support Syst..

[27]  A. D. Henriksen,et al.  A practical R&D project-selection scoring tool , 1999 .

[28]  Rubén Lostado-Lorza,et al.  Using the finite element method and data mining techniques as an alternative method to determine the maximum load capacity in tapered roller bearings , 2017, J. Appl. Log..

[29]  Nicola Torelli,et al.  Training and assessing classification rules with imbalanced data , 2012, Data Mining and Knowledge Discovery.

[30]  Ana Okariz,et al.  Use of decision tree models based on evolutionary algorithms for the morphological classification of reinforcing nano-particle aggregates , 2014 .

[31]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[32]  L. Butler,et al.  Explaining Australia’s increased share of ISI publications—the effects of a funding formula based on publication counts , 2003 .

[33]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[34]  Sulin Pang,et al.  C5.0 Classification Algorithm and Application on Individual Credit Evaluation of Banks , 2009 .

[35]  Kai Petersen,et al.  Towards understanding the relation between citations and research quality in software engineering studies , 2018, Scientometrics.

[36]  Andrea Schiffauerova,et al.  Application of machine learning techniques to assess the trends and alignment of the funded research output , 2020, J. Informetrics.

[37]  Emanuel Kulczycki,et al.  Toward an excellence-based research funding system: Evidence from Poland , 2017, J. Informetrics.

[38]  Uk Jung,et al.  An ANP approach for R&D project evaluation based on interdependencies between research objectives and evaluation criteria , 2010, Decis. Support Syst..

[39]  D. Braun,et al.  The role of funding agencies in the cognitive development of science , 1998 .

[40]  U. Sandström,et al.  Quantity and/or Quality? The Importance of Publishing Many Papers , 2016, PloS one.

[41]  Jenna Kim,et al.  The impact of imbalanced training data on machine learning for author name disambiguation , 2018, Scientometrics.

[42]  J. Britt Holbrook,et al.  Evaluating Research beyond Scientific Impact How to Include Criteria for Productive Interactions and Impact on Practice and Society , 2013 .

[43]  Lazaros Andronis,et al.  Setting Medical Research Future Fund priorities: assessing the value of research , 2017, The Medical journal of Australia.

[44]  Taghi M. Khoshgoftaar,et al.  Examining characteristics of predictive models with imbalanced big data , 2019, Journal of Big Data.

[45]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[46]  F. T. Dweiri,et al.  Using fuzzy decision making for the evaluation of the project management internal efficiency , 2006, Decis. Support Syst..

[47]  Ana Isabel Canhoto,et al.  Artificial intelligence and machine learning as business tools: A framework for diagnosing value destruction potential , 2020 .

[48]  Uno Fors,et al.  Evaluating research: A multidisciplinary approach to assessing research practice and quality , 2016 .

[49]  Mirka Saarela,et al.  Can we automate expert-based journal rankings? Analysis of the Finnish publication indicator , 2020, J. Informetrics.

[50]  Laura Cruz-Castro,et al.  The effects of the economic crisis on public research: Spanish budgetary policies and research organizations , 2016 .

[51]  Stylianos Kavadias,et al.  Strategic Resource Allocation: Top-Down, Bottom-Up, and the Value of Strategic Buckets , 2013, Manag. Sci..

[52]  Tommi Kärkkäinen,et al.  Expert-based versus citation-based ranking of scholarly and scientific publication channels , 2016, J. Informetrics.

[53]  Diego R. Amancio,et al.  Analyzing the relationship between text features and research proposal success , 2020, ArXiv.

[54]  Jian Ma,et al.  An intelligent decision support approach for reviewer assignment in R&D project selection , 2016, Comput. Ind..

[55]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[56]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[57]  Jian-Bo Yang,et al.  Research project evaluation and selection: an evidential reasoning rule-based method for aggregating peer review information with reliabilities , 2015, Scientometrics.

[58]  Majid Khedmati,et al.  Binary classification of imbalanced datasets: The case of CoIL challenge 2000 , 2019, Expert Syst. Appl..