Linear Regression Methods

Linear regression is a broad and well-developed area of statistics. If there is a core to statistical methodology, then linear regression is it. The ubiquity of linear regression methods in statistics and data analytics stems from the ease with which one may fit tractable models that describe the primary features of a process or population. Not only is linear regression useful for description, it’s also very useful for prediction since the models often provide good approximations of complex relationships. In the field of statistics, hypothesis testing and confidence intervals are routinely used in linear regression analyses. The extension of these methods to data science is often unsuccessful because of the prevalence of opportunistically collected data. Most of the time, opportunistically collected data cannot support inferential methods because the quality of the inferences produced by the methods is unknown. We discuss inference herein so that the reader may understand the potential for success and for failure of these methods. However, the focus is on the essential and most useful aspects of the subject matter for data analytics—the fitted models. The topic of linear regression provides an avenue to gain experience with the statistical package R, one of the most popular software packages used by data scientists.