Intro to Regression: Part 1: What is regression? (generally speaking)

Regression analysis is a statistical procedure that examines the correlation between two or more variables in a dataset.  Regression correlates a single "response" variable to one or more "predictor" variables, by way of a regression function.

The regression function takes the predictor variable(s) as argument(s) and returns the predicted value of the response variable.  The regression function is tuned in order to minimize the difference between the predicted value and actual value of the response variable.

 

Terminology

  • The "regression function" is aka the "regression model" or simply the "model"
  • The "response variable" is aka the "outcome variable" (as in, the "outcome" of the regression function)
  • The "predictor variable(s)" are aka the "regressor variable(s)"
 

Types of regression models

The world of regression models can be divided (tautologically) into two types:

  1. Linear models
  2. Non-linear models

Linear models are (relatively) easy.  Non-linear models are hard.  Non-linear models are so hard that statisticians are always looking for ways to transform them into linear models.  

This blog series will focus exclusively on linear models.  Linear models are easier to analyze and interpret, and they are much less computationally complex to solve.  

That said, linear models carry with them a rather lengthy list of assumptions about the underlying data.  These assumptions are, in part, the reason that linear models are relatively easy to work with.  We'll cover these assumptions later on, after we've developed a basic understanding of regression itself.

Intro to Regression: 1 . 2 . 3 . 4 . 5 ...

/