PredPsych: A toolbox for predictive machine learning-based approach in experimental psychology research

Recent years have seen an increased interest in machine learning-based predictive methods for analyzing quantitative behavioral data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible implementation. The aim of current work was to build an open-source R toolbox – “PredPsych” – that could make these methods readily available to all psychologists. PredPsych is a user-friendly, R toolbox based on machine-learning predictive algorithms. In this paper, we present the framework of PredPsych via the analysis of a recently published multiple-subject motion capture dataset. In addition, we discuss examples of possible research questions that can be addressed with the machine-learning algorithms implemented in PredPsych and cannot be easily addressed with univariate statistical analysis. We anticipate that PredPsych will be of use to researchers with limited programming experience not only in the field of psychology, but also in that of clinical neuroscience, enabling computational assessment of putative bio-behavioral markers for both prognosis and diagnosis.

[1]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[2]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[3]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[4]  Julie D. Golomb,et al.  A Neural Basis of Facial Action Recognition in Humans , 2016, The Journal of Neuroscience.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[7]  Jian Pei,et al.  Cluster Analysis: Basic Concepts and Methods , 2012 .

[8]  W. Hays Using Multivariate Statistics , 1983 .

[9]  Brian J. Norris,et al.  Coping with variability in small neuronal networks. , 2011, Integrative and comparative biology.

[10]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[11]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[12]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  C. R. Rao,et al.  The Utilization of Multiple Measurements in Problems of Biological Classification , 1948 .

[15]  Cristina Becchio,et al.  Decoding intentions from movement kinematics , 2016, Scientific Reports.

[16]  Ji-Hyun Kim,et al.  Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[17]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[18]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .

[19]  Brian D. Ripley,et al.  Modern applied statistics with S, 4th Edition , 2002, Statistics and computing.

[20]  Yuan Yang,et al.  Predicting Object Size from Hand Kinematics: A Temporal Perspective , 2015, PloS one.

[21]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[22]  Randall D. Beer,et al.  Evolution and Analysis of Model CPGs for Walking: II. General Principles and Individual Variability , 1999, Journal of Computational Neuroscience.

[23]  Alan L. Yuille,et al.  Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief , 2011, NeuroImage.

[24]  J. Mackie,et al.  The Conduct of Inquiry: Methodology for Behavioural Science , 1965 .

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  B. Tabachnick,et al.  Using multivariate statistics, 5th ed. , 2007 .

[27]  T. Therneau,et al.  An Introduction to Recursive Partitioning Using the RPART Routines , 2015 .

[28]  John D. Kelleher,et al.  Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies , 2015 .

[29]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[30]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[31]  Diana Adler,et al.  Using Multivariate Statistics , 2016 .

[32]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[33]  Kenneth Rockwood,et al.  Comparison of Machine Learning Techniques with Classical Statistical Models in Predicting Health Outcomes , 2004, MedInfo.

[34]  Agostino Di Ciaccio,et al.  Computational Statistics and Data Analysis Measuring the Prediction Error. a Comparison of Cross-validation, Bootstrap and Covariance Penalty Methods , 2022 .

[35]  Jessica L. Allen,et al.  Neuromechanical Principles Underlying Movement Modularity and Their Implications for Rehabilitation , 2015, Neuron.

[36]  Hongmei Zhang Cluster Analysis in Data mining , 2020 .

[37]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[38]  Malcolm R. Forster,et al.  Predictive Accuracy as an Achievable Goal of Science , 2002, Philosophy of Science.

[39]  R. Passingham,et al.  Reading Hidden Intentions in the Human Brain , 2007, Current Biology.

[40]  M. Frank,et al.  Computational psychiatry as a bridge from neuroscience to clinical applications , 2016, Nature Neuroscience.

[41]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[42]  K. Shenoy,et al.  A Central Source of Movement Variability , 2006, Neuron.

[43]  Adrian E. Raftery,et al.  Model-based Methods of Classification: Using the mclust Software in Chemometrics , 2007 .

[44]  Melody Y. Kiang,et al.  A comparative assessment of classification methods , 2003, Decis. Support Syst..

[45]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[46]  George Forman,et al.  Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement , 2010, SKDD.

[47]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[48]  T. Yarkoni,et al.  Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning , 2017, Perspectives on psychological science : a journal of the Association for Psychological Science.

[49]  Browne,et al.  Cross-Validation Methods. , 2000, Journal of mathematical psychology.

[50]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[51]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[52]  J. Delafield-Butt,et al.  Toward the Autism Motor Signature: Gesture patterns during smart tablet gameplay identify children with autism , 2018 .

[53]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[54]  Gemma C. Garriga,et al.  Permutation Tests for Studying Classifier Performance , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[55]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[56]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[57]  David G. Stork,et al.  Pattern Classification , 1973 .

[58]  A. Nierenberg,et al.  Predictive analytics in mental health: applications, guidelines, challenges and perspectives , 2017, Molecular Psychiatry.

[59]  Kelvin E. Jones,et al.  Sources of signal-dependent noise during isometric force production. , 2002, Journal of neurophysiology.

[60]  Andres Hoyos Idrobo,et al.  Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines , 2016, NeuroImage.

[61]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .

[62]  Galit Shmueli,et al.  Predictive Analytics in Information Systems Research , 2010, MIS Q..

[63]  J. William Ahwood,et al.  CLASSIFICATION , 1931, Foundations of Familiar Language.

[64]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[65]  Brian Everitt,et al.  Cluster analysis , 1974 .

[66]  Gail Gong Cross-Validation, the Jackknife, and the Bootstrap: Excess Error Estimation in Forward Logistic Regression , 1986 .

[67]  C. Becchio,et al.  Doing It Your Way: How Individual Movement Styles Affect Action Prediction , 2016, PloS one.