Everything you wanted to know about Linear Regression?

Mohamed Bakrey Mahmoud
3 min readJul 20, 2022

--

Introduction

First of all, I would like to have a look that artificial intelligence, “big data” and “machine learning” are some of the most searched scientific terms on the internet these days. Most of us are increasingly using AI in our daily lives and frequently, sometimes without even realizing that we are doing it. AI-based products are capable of performing human-like activities because machine learning algorithms act as their brain. Linear regression is one of the most popular machine learning algorithms.

What is the Linear Regression?

We use linear regression analysis to predict the value of one variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you use to predict the value of the other variable is called the independent variable.

The general idea of linear regression:

The general idea of linear regression:

It is based on two things:

(1) Does a set of predictor variables do a good job of predicting an outcome variable?

(2) What variables, in particular, are significant predictors of the outcome variable, and in what way — indicated by the size and sign of the beta estimates — affect the outcome variable? Regression estimates are used to explain the relationship between one dependent variable and one or more independent variables. The simplest form of a regression equation with one dependent variable and one independent variable is determined by the formula y = c + b * x, where y = the degree of the estimated dependent variable, c = a constant, b = the regression coefficient, and x = the degree on the independent variable.

Types of Linear Regression

  • simple linear regression

dependent variable (interval or ratio), 1 independent variable (interval, ratio, or dichotomous)

  • Multiple linear regression

dependent variable (interval or ratio), 2 + independent variables (interval, ratio or dichotomous)

  • Logistic regression

dependent variable (dichotomous), 2+ independent variable(s) (interval, ratio, or dichotomous)

  • Ordinal regression

dependent variable (ordinal), 1+ independent (nominal or dichotomous) variable(s)

  • polynomial regression

dependent variable (nominal), 1+ independent variable(s) (interval, ratio, or dichotomous)

  • Featured Analysis

dependent variable (nominal), 1+ independent variable(s) (interval or ratio)

When selecting the model for analysis, an important consideration is the fit of the model. Adding independent variables to a linear regression model will always increase the model’s explained variance (usually expressed in R²). However, overshooting can occur by adding too many variables to the model, which reduces the model’s generalizability. Occam’s code describes the problem very well — a simple model is usually preferred over a more complex one. Statistically, if a model includes a large number of variables, some variables will be statistically significant due to chance alone.

What are the advantages and disadvantages of logistic regression?

Advantages of logistic regression

1. Logistic regression performs well when the data set is linearly separable.

2. Logistic regression is less prone to overfitting but can increase in high-dimensional data sets. You should consider regulation techniques (L1 and L2) to avoid over-fitting in these scenarios

3. Logistic regression not only gives a measure of the predictor’s correlation (the magnitude of the modulus) but also gives the direction of its correlation (positive or negative).

4. Logistic regression is easier to implement and interpret and is highly effective training.

Disadvantages of logistic regression

1. The main limitation of logistic regression is the assumption of linearity between the dependent variable and the independent variables. In the real world, data is rarely linearly separable. Most of the time the data will be a jumbled mess.

2. If the number of observations is less than the number of features, logistic regression should not be used, otherwise, it may lead to overuse.

3. Logistic regression can only be used to predict discrete jobs. Therefore, the dependent variable of the logistic regression is limited to the discrete set of numbers. This limitation is in and of itself problematic, as it prohibits continuous data prediction.

Mohamed B Mahmoud. Data Scientist.

--

--

Mohamed Bakrey Mahmoud
Mohamed Bakrey Mahmoud

No responses yet