Linear Regression- From Scratch!

Somya Maheshwari
The Startup
Published in
5 min readJan 8, 2021

--

Linear Regression is one of the first algorithms you will learn when you begin your journey into the fields of Data Science and Machine Learning. Through this blog, we will not only understand what Linear Regression is but also implement the algorithm from scratch.

Now, before we dive into it, let us first understand what Regression is.

In simple terms, Regression is a method used to determine the relationship between a dependent variable and one or more independent variables. It lets you understand patterns in the data.

The independent variables are also known as features and the dependent variable is also known as the target.

Linear Regression is a basic supervised learning algorithm that assumes a linear relationship between the independent variables and the dependent variable. If there is only one independent variable, it is called simple linear regression and if there is more than one independent variable, we call that multiple linear regression.

Equation of a simple linear regression

Here y is the dependent variable, theta1 is the intercept, theta2 is the slope and x is the independent variable. Both theta1 and theta2 are regression coefficients.

Image credits: Wikipedia

In the figure above, to understand the trend in data we need to find the line that best fits our data. To do so, we have to determine the values of the coefficients, theta1, and theta2. But how do we do that?

There are two methods to find these. The Ordinary Least Square Method and the other one is called Gradient Descent Approach.

Here, we will be implementing the Gradient Descent Approach, from scratch!

But what is the Gradient Descent approach?

Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function.

In our case, this differentiable function is the cost function that is the error in our calculated values of the coefficients, theta1, and theta2. The cost function is known as the loss function when the entire training set is taken into consideration.

Squared Error Cost Function

Our goal is to minimize this cost function so that we can obtain the most accurate values of theta1 and theta2.

Now that we are familiar with the basics, let’s get to coding!!

Don’t worry, we’ll go step by step.

We’ll start by importing the necessary libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

We’ll be using a very simple dataset to just get a clear idea of the implementation part.

You can obtain the dataset from here:

This is what our data looks like. Pretty simple, right? Real-world data is nothing like that 😛.

This is the variation of the Salary of an employee with the number of years of work experience they have.

Replacing the column names with X and y for our convenience:

data=data.rename(columns={data.columns[0]:'X',data.columns[1]:'y'})

Splitting our data into training and test using train_test_split:

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(data.X, data.y, test_size=0.20, random_state=42)

Visualizing the training data

plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.title('Salary vs Experience (Training Set)')
plt.scatter(X_train,y_train)
plt.show()

You can see it very well looks like a linear relationship.

Defining our cost function:

def cost(y,ypred):
return (1/len(y))*sum(y-ypred)**2

Initializing the parameters:

#Initialize parameters
n = float(len(X_train))
alpha=0.001 #learning rate
b1=0 #slope
b2=0 #intercept

Now comes the interesting part, training the data:

epochs=20000for itr in range(epochs):
y_pred = b1*X_train + b2 # The current predicted value of y
D1 = (-2/n) * sum(X_train * (y_train - y_pred)) #Derivative wrt b1 D2 = (-2/n) * sum(y_train - y_pred) # Derivative wrt b2 b1 = b1 - alpha * D1 # Update b1
b2 = b2 - alpha * D2 # Update b2

error=cost(y_train,y_pred)
print("MSE: ",error,'\n')
print (b1,b2)

Look what values we obtained after training the data for 20000 iterations!

Visualizing our results!!

Congrats! We were able to predict the best fitting line for the data and minimize the cost function.

Predicting for a specific test case:

def predict(x):
return b2+b1*x
predict(1.7)

The result:

And seeing our data, it makes sense to obtain this answer.

So we’re done with our Gradient Descent implementation and the equation of the best fitting line we obtained is,

That's it. We did it!!

I hope this article was helpful to you. Let me know if you found any errors. Feel free to contact me for any queries. Happy learning!! 😊

You can check out my Github profile for the code:

Some useful resources:

  1. https://www.youtube.com/watch?v=sDv4f4s2SB8 (Gradient Descent by StatQuest)
  2. https://towardsdatascience.com/understanding-the-mathematics-behind-gradient-descent-dde5dc9be06e
  3. https://mubaris.com/posts/linear-regression/
  4. https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931
  5. https://www.coursera.org/learn/machine-learning (Machine Learning Course by Andrew Ng)

Contact me:

LinkedIn: https://www.linkedin.com/in/somya22/

Email: somya.m2000@gmail.com

--

--