Hendra Bunyamin

Forgiven sinner and Lecturer at Maranatha Christian University

Deriving Normal Equation of Linear Regression Model

This article is inspired by an excellent post by Eli Bendersky. Let’s continue with the derivation.

Cost function has been explained in Week 1 and Week 2 of Machine Learning course taught by Andrew Ng. This post tries to explain how to derive normal equation for linear regression with multiple variables. It is a good thing if all readers has studied Week 1 and Week 2 before reading this post.

The cost function of linear regression with multiple variables, $J(\theta)$ is formulated as follows:

with $m$ is number of instances in dataset, $h_{\theta}(x^{(i)})$ is our hyphotesis also known as prediction model for the $i$th instance, and $y^{(i)}$ is true value for the $i$th instance.

We also have studied that

By substituting \eqref{eq:the-hyphotesis} into \eqref{eq:cost-function}, we obtain

By defining



equation \eqref{eq:derivation-5} becomes

We have arrived into a matrix form from linear regression cost function. Our next step would be:

How can we minimize the cost function in Equation \eqref{eq:derivation-10}?

We will employ the derivation formula from Matrix Calculus; specifically, we use two scalar-by-vector identities with denominator layout (result: column vector). The identities are as follows:


Now equipped with these identities, let us minimize Equation \eqref{eq:derivation-10} by computing the first derivation of $J(\theta)$; specifically, the Part I is computed with Equation \eqref{eq:identity-1} and Part II with Equation \eqref{eq:identity-2}:

In order to find $\theta$ which minimize Equation \eqref{eq:derivation-10}, we need to solve

At last, we have derived the normal equation of linear regression model that is

Written on August 18, 2019