Using R and RStudio for Data Management, Statistical Analysis, and Graphics

Data Input and Output Input Output Further resources Data Management Structure and metadata Derived variables and data manipulation Merging, combining, and subsetting datasets Date and time variables Further resources Examples Statistical and Mathematical Functions Probability distributions and random number generation Mathematical functions Matrix operations Examples Programming and Operating System Interface Control flow, programming, and data generation Functions Interactions with the operating system Common Statistical Procedures Summary statistics Bivariate statistics Contingency tables Tests for continuous variables Analytic power and sample size calculations Further resources Examples Linear Regression and ANOVA Model fitting Tests, contrasts, and linear functions of parameters Model results and diagnostics Model parameters and results Further resources Examples Regression Generalizations and Modeling Generalized linear models Further generalizations Robust methods Models for correlated data Survival analysis Multivariate statistics and discriminant procedures Complex survey design Model selection and assessment Further resources Examples A Graphical Compendium Univariate plots Univariate plots by grouping variable Bivariate plots Multivariate plots Special-purpose plots Further resources Examples Graphical Options and Configuration Adding elements Options and parameters Saving graphs Simulation Generating data Simulation applications Further resources Special Topics Processing by group Simulation-based power calculations Reproducible analysis and output Advanced statistical methods Further resources Case Studies Data management and related tasks Read variable format files Plotting maps Data scraping Text mining Interactive visualization Manipulating bigger datasets Constrained optimization: the knapsack problem Appendix A: Introduction to R and RStudio Appendix B: The HELP Study Dataset Appendix C: References Appendix D: Indices

[1]  D. Collet Modelling Survival Data in Medical Research , 2004 .

[2]  Douglas G. Altman,et al.  Measurement in Medicine: The Analysis of Method Comparison Studies , 1983 .

[3]  D. Harrington,et al.  Counting Processes and Survival Analysis , 1991 .

[4]  Christopher Gandrud Tools for Simulating and Plotting Quantities of InterestEstimated from Cox Proportional Hazards Models , 2015 .

[5]  Robert H. Riffenburgh,et al.  Linear Discriminant Analysis , 1960 .

[6]  N. Horton,et al.  Overdose after detoxification: a prospective study. , 2007, Drug and alcohol dependence.

[7]  김동일,et al.  LARS(Least Angle Regression)와 유전알고리즘을 결합한 변수 선택 알고리즘 , 2009 .

[8]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[9]  T. Hothorn,et al.  Simultaneous Inference in General Parametric Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[10]  Mary Jo Larson,et al.  Primary medical care and reductions in addiction severity: a prospective cohort study. , 2005, Addiction.

[11]  Frederick Mosteller,et al.  Fifty Challenging Problems in Probability with Solutions , 1987 .

[12]  John Verzani,et al.  Using R for introductory statistics , 2018 .

[13]  M. Lavine Introduction to Statistical Thought , 2009 .

[14]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[15]  P. R. Rider MOMENTS OF MOMENTS. , 1929, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[17]  Mary Jo Larson,et al.  Linkage with primary medical care in a prospective cohort of adults with addictions in inpatient detoxification: room for improvement. , 2004, Health services research.

[18]  R. Allan Reese,et al.  Linear Mixed Models: a Practical Guide using Statistical Software , 2008 .

[19]  Hadley Wickham,et al.  Reshaping Data with the reshape Package , 2007 .

[20]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[21]  B. Tabachnick,et al.  Using multivariate statistics, 5th ed. , 2007 .

[22]  N. Horton,et al.  Emergency Department and Hospital Utilization Among Alcohol and Drug-Dependent Detoxification Patients without Primary Medical Care , 2006, The American journal of drug and alcohol abuse.

[23]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[24]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[25]  Andrew Gelman,et al.  Let's Practice What We Preach , 2002 .

[26]  P. Friedmann,et al.  Slowing the revolving door: stabilization programs reduce homeless persons' substance use after detoxification. , 2003, Journal of substance abuse treatment.

[27]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[28]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[29]  D. Collett,et al.  Modelling Binary Data. , 1994 .

[30]  Jasjeet S. Sekhon,et al.  Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R , 2008 .

[31]  Alan Vaarwerk From What I Hear , 2011 .

[32]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[33]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[34]  Leland Wilkinson,et al.  The Grammar of Graphics, Second Edition , 2005, Statistics and computing.

[35]  Hugh E. Williams,et al.  Learning MySQL , 2006 .

[36]  Andrew D. Martin,et al.  MCMCpack: Markov chain Monte Carlo in R , 2011 .

[37]  Nicholas J. Horton,et al.  A Method for Modeling Utilization Data from Multiple Sources: Application in a Study of Linkage to Primary Care , 2004, Health Services and Outcomes Research Methodology.

[38]  Andrej Pázman,et al.  Nonlinear Regression , 2019, Handbook of Regression Analysis With Applications in R.

[39]  Paul Murrell,et al.  R Graphics , 2018, Computer science and data analysis series.

[40]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[41]  J. Faraway Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models , 2005 .

[42]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[43]  Andrew Gelman,et al.  R2WinBUGS: A Package for Running WinBUGS from R , 2005 .

[44]  J. Ware,et al.  Applied Longitudinal Analysis , 2004 .

[45]  Uwe Ligges,et al.  Scatterplot3d - an R package for visualizing multivariate data , 2003 .

[46]  Enrique Castillo,et al.  An ordered family of Lorenz curves , 1999 .

[47]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[48]  K. Beath Random Effects Latent Class Analysis , 2015 .

[49]  D. Mills Jamie Using Computer Simulation Methods to Teach Statistics: A Review of the Literature , 2002 .

[50]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[51]  J. Hilbe Negative Binomial Regression: Preface , 2007 .

[52]  Ben Baumer,et al.  R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics , 2014, 1402.1894.

[53]  Thomas Lumley,et al.  Analysis of Complex Survey Samples , 2004 .

[54]  L. Infante,et al.  Hierarchical Clustering , 2020, International Encyclopedia of Statistical Science.

[55]  Robert Gentleman,et al.  Statistical Analyses and Reproducible Research , 2007 .

[56]  Gerhard Tutz Poisson Regression , 2011, International Encyclopedia of Statistical Science.

[57]  Hadley Wickham,et al.  Tools for Working with URLs and HTTP , 2016 .

[58]  Adrian G. Barnett,et al.  An Introduction to Generalized Linear Models, Third Edition , 1990 .

[59]  “ Multiple Imputation in Practice : Comparison of Software Packages for Regression Models With Missing Variables , ” , 2002 .

[60]  Achim Zeileis,et al.  Diagnostic Checking in Regression Relationships , 2015 .

[61]  J. Faraway Linear Models with R , 2014 .

[62]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[63]  S. R. Jammalamadaka,et al.  Topics in Circular Statistics , 2001 .

[64]  F. Leisch FlexMix: A general framework for finite mixture models and latent class regression in R , 2004 .

[65]  T. Breurch,et al.  A simple test for heteroscedasticity and random coefficient variation (econometrica vol 47 , 1979 .

[66]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[67]  Nicholas J. Horton,et al.  Use of R as a Toolbox for Mathematical Statistics Exploration , 2004 .

[68]  Hadley Wickham,et al.  Spatial Visualization with Google Maps and OpenStreetMap , 2015 .

[69]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[70]  Mary Jo Larson,et al.  Linking alcohol- and drug-dependent adults to primary medical care: a randomized controlled trial of a multi-disciplinary health intervention in a detoxification unit. , 2003, Addiction.

[71]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[72]  Jim Albert,et al.  Bayesian Computation with R , 2008 .

[73]  John Fox,et al.  Robust Regression in R An Appendix to An R Companion to Applied Regression, Second Edition , 2011 .

[74]  N M Laird,et al.  Maximum likelihood regression methods for paired binary data. , 1990, Statistics in medicine.

[75]  J. Hardin,et al.  Generalized Estimating Equations , 2002 .

[76]  Stephane Champely,et al.  Basic Functions for Power Analysis , 2015 .

[77]  Kurt Hornik,et al.  Exact Distributions for Rank and Permutation Tests , 2015 .

[78]  Maria L. Rizzo,et al.  Statistical Computing with R , 2007 .

[79]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[80]  Martin Sill,et al.  Reproducible Research with R and RStudio. C. Gandrud (2013). Chapman & Hall/CRC: The R Series. 294 pages, ISBN‐13: 978‐1466572843. , 2014 .

[81]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[82]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[83]  W. Marsden I and J , 2012 .

[84]  Paul Murrell,et al.  Introduction to Data Technologies , 2009 .

[85]  John Fox,et al.  Aspects of the Social Organization and Trajectory of the R Project , 2009, R J..

[86]  Nicholas J Horton,et al.  The relationship between sexual and physical abuse and substance abuse consequences. , 2002, Journal of substance abuse treatment.

[87]  Hadley Wickham ASA 2009 Data Expo , 2011 .

[88]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[89]  Achim Zeileis,et al.  partykit : A Toolkit for Recursive Partytioning , 2015 .

[90]  Drew A. Linzer,et al.  poLCA: An R Package for Polytomous Variable Latent Class Analysis , 2011 .

[91]  Edward R. Tufte,et al.  Envisioning Information , 1990 .

[92]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[93]  Bryan F. J. Manly,et al.  Multivariate Statistical Methods: A Primer, Third Edition , 1994 .

[94]  N. Horton,et al.  A cautionary note regarding count models of alcohol consumption in randomized controlled trials , 2007, BMC medical research methodology.

[95]  Joshua F. Wiley,et al.  Automating Mplus Model Estimation and Interpretation , 2014 .

[96]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[97]  T. Yee The VGAM Package for Categorical Data Analysis , 2010 .

[98]  A. Dreher Modeling Survival Data Extending The Cox Model , 2016 .

[99]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[100]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[101]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[102]  H. Wickham Easily Tidy Data with 'spread()' and 'gather()' Functions , 2016 .

[103]  Yulia R. Gel,et al.  lawstat: An R Package for Law, Public Policy and Biostatistics , 2008 .

[104]  Jinko Graham,et al.  elrm: Software Implementing Exact-Like Inference for Logistic Regression Models , 2007 .

[105]  N. Horton,et al.  Relationship of depressive symptoms and mental health functioning to repeat detoxification. , 2005, Journal of substance abuse treatment.

[106]  Kurt Hornik,et al.  Chronological Objects which can Handle Dates and Times , 2015 .

[107]  Duncan Temple Lang,et al.  XML and Web Technologies for Data Sciences with R , 2013 .

[108]  E. Tufte Beautiful Evidence , 2006 .

[109]  U. Ligges,et al.  Tests for Normality , 2015 .

[110]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[111]  N. Horton,et al.  Association of alcohol consumption with HIV sex- and drug-risk behaviors among drug users. , 2001, Journal of substance abuse treatment.

[112]  Hadley Wickham,et al.  Dates and Times Made Easy with lubridate , 2011 .

[113]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[114]  Hugh A. Chipman,et al.  Recursive Partitioning , 2011, International Encyclopedia of Statistical Science.

[115]  Larry Gonick,et al.  Cartoon Guide to Statistics , 1993 .

[116]  Katharina Burger Counterexamples In Probability And Statistics , 2016 .

[117]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[118]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[119]  Jeffrey S. Rosenthal,et al.  Probability and Statistics: The Science of Uncertainty , 2003 .

[120]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[121]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[122]  D. Ellis Visual explanations: Images and quantities , 1997 .

[123]  Kurt Hornik,et al.  Text Mining Infrastructure in R , 2008 .

[124]  H. Wickham,et al.  A Grammar of Data Manipulation , 2015 .

[125]  Tim Hesterberg,et al.  Bootstrap Methods and Permutation Tests* 14.1 the Bootstrap Idea 14.2 First Steps in Using the Bootstrap 14.3 How Accurate Is a Bootstrap Distribution? 14.4 Bootstrap Confidence Intervals 14.5 Significance Testing Using Permutation Tests Introduction , 2004 .