On Machine Learning with Imbalanced Data and Research Quality Evaluation Methodologies

In this article a synoptic review of machine learning techniques with imbalanced data and a class of corresponding learning algorithms is presented. This class of algorithms includes the meta-algorithms: Cost sensitive, Metacost, Rotation forest-cost sensitive, rotation forest-smote. Four learning algorithms (with base classifiers J48 and part processing with F-measure and a predetermined imbalanced data set) are compared in the computational environment WEKA leading to comparative numerical results. The basic concepts of research quality evaluation methodologies are presented, an adaptive citation qualitative-quantitative approach and advanced bibliometric indicators are given. Basic components of research quality performance such as research journal cited publications, citing publications and research quality evaluations at various academic levels are considered and corresponding numerical results are given. An alternative approach using certain machine learning algorithms with imbalanced data in the case of research quality evaluation methodologies is proposed.

[1]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[2]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[3]  John Mingers,et al.  Counting the citations: a comparison of Web of Science and Google Scholar in the field of business and management , 2010, Scientometrics.

[4]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  A. Raan Measuring Science: Capita Selecta of Current Main Issues , 2004 .

[8]  Miltiadis D. Lytras,et al.  Knowledge Management as a Reference Theory for E-Learning: A Conceptual and Technological Perspective , 2005, Int. J. Distance Educ. Technol..

[9]  R. Kaplan,et al.  The balanced scorecard--measures that drive performance. , 2015, Harvard business review.

[10]  van Raan,et al.  Advanced bibliometric methods to assess research performance and scientific development: basic principles and recent practical applications , 1993 .

[11]  A. Neely,et al.  Citation Counts: Are They Good Predictors of Rae Scores? A Bibliometric Analysis of RAE 2001 , 2008 .

[12]  Surajit Chaudhuri,et al.  An overview of business intelligence technology , 2011, Commun. ACM.

[13]  Nils J. Nilsson,et al.  Artificial Intelligence: A New Synthesis , 1997 .

[14]  P. Koellinger The Relationship between Technology, Innovation, and Firm Performance: Empirical Evidence on E-Business in Europe , 2008 .

[15]  Ken Fernstrom,et al.  The Quantitative Crunch: The Impact of Bibliometric Research Quality Assessment Exercises on Academic Development at Small Conferences. , 2009 .

[16]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[17]  P. Gács,et al.  Algorithms , 1992 .

[18]  Henk F. Moed,et al.  Citation Analysis in Research Evaluation , 1899 .

[19]  William W. Cohen Pac-Learning Recursive Logic Programs: Efficient Algorithms , 1994, J. Artif. Intell. Res..

[20]  Horst Bunke,et al.  Hybrid methods in pattern recognition , 1987 .

[21]  E. B. Swanson,et al.  Measuring Business Value of Information Technologies , 1988 .

[22]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[23]  A. Raan The use of bibliometric analysis in research performance assessment and monitoring of interdisciplinary scientific developments , 2003 .

[24]  Ivan Bratko,et al.  Machine learning in artificial intelligence , 1993, Artif. Intell. Eng..

[25]  John Haugeland,et al.  Artificial intelligence - the very idea , 1987 .

[26]  Alaa F. Sheta,et al.  Business Intelligence and Performance Management: Theory, Systems and Industrial Applications , 2013 .

[27]  Ali F. Farhoomand,et al.  Managing (e)Business Transformation: A Global Perspective , 2004 .

[28]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[29]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[30]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[31]  José Salvador Sánchez,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[32]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[33]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[34]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[35]  J. Nicolaisen The scholarliness of published peer reviews: a bibliometric study of book reviews in selected social science fields , 2002 .

[36]  Carine Dominguez Strategies for e-Business—Creating Value through Electronic and Mobile Commerce, 2nd ed., Tawfic Jelassi, Albrecht Enders. (2008), ISBN: 9780273710288 , 2009 .

[37]  Olivia Parr Rud,et al.  Business Intelligence Success Factors: Tools for Aligning Your Business in the Global Economy , 2009 .

[38]  Juan José Rodríguez Diez,et al.  An Experimental Study on Rotation Forest Ensembles , 2007, MCS.

[39]  P. Gross,et al.  COLLEGE LIBRARIES AND CHEMICAL EDUCATION. , 1927, Science.

[40]  Nicola Guarino,et al.  Ontologies and Knowledge Bases. Towards a Terminological Clarification , 1995 .

[41]  Hans Peter Luhn,et al.  A Business Intelligence System , 1958, IBM J. Res. Dev..

[42]  Randy Goebel,et al.  Computational intelligence - a logical approach , 1998 .

[43]  Herbert A. Simon,et al.  Artificial Intelligence: An Empirical Science , 1995, Artif. Intell..

[44]  Alexandra Lipitakis,et al.  Adaptive Algorithmic Schemes for E-Service Strategic Management Methodologies: Case Studies on Knowledge Management , 2007, IEEE International Conference on e-Business Engineering (ICEBE'07).

[45]  Lynda Aiman-Smith,et al.  Assessing a multidimensional measure of radical technological innovation , 1995 .

[46]  J. Sterne,et al.  E-Metrics-business metrics for the new economy , 2000 .

[47]  Donald E. Knuth,et al.  The art of computer programming: V.1.: Fundamental algorithms , 1997 .

[48]  John Mingers,et al.  Evaluating a department's research: Testing the Leiden methodology in business and management , 2013, Inf. Process. Manag..

[49]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[50]  Peter Rausch,et al.  Business Intelligence and Performance Management , 2015, Advanced Information and Knowledge Processing.

[51]  Raymond Kurzweil,et al.  Age of intelligent machines , 1990 .

[52]  Eva Pfirter La protection des intérêts britanniques en Égypte par la Suisse en 1956 : une stratégie de compensation , 2010 .

[53]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[54]  Johan Bollen,et al.  Journal status , 2006, Scientometrics.

[55]  P. Beynon-Davies E-Business , 2004 .

[56]  Yvonne Rogers,et al.  Citation counting, citation ranking, and h-index of human-computer interaction researchers: A comparison of Scopus and Web of Science , 2008, J. Assoc. Inf. Sci. Technol..

[57]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[58]  A. Clark,et al.  Artificial Intelligence: The Very Idea. , 1988 .

[59]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[60]  David F. Midgley,et al.  e-Business strategy and firm performance: a latent class assessment of the drivers and impediments to success , 2007, J. Inf. Technol..

[61]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[62]  Bradley C. Wheeler,et al.  NEBIC: A Dynamic Capabilities Theory for Assessing Net-Enablement , 2002, Inf. Syst. Res..

[63]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[64]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[65]  Richard Ernest Bellman,et al.  An Introduction to Artificial Intelligence: Can Computers Think? , 1978 .

[66]  E. P. Michael Strategy and the Internet. , 2001 .

[67]  Vladimir Gorovoy,et al.  Learning Resources Organization Using Ontological Framework , 2009, ICWL.

[68]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[69]  Drew McDermott,et al.  Introduction to artificial intelligence , 1986, Addison-Wesley series in computer science.

[70]  Xiangji Huang,et al.  Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles , 2006, PAKDD.

[71]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[72]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .