Loading [MathJax]/jax/output/CommonHTML/jax.js

Computing Gradients of Cost Function from Logistic Regression

In linear regression with one variable, we have a design matrix (X) that represents our dataset and specifically has a shape as follows:

X=[1x(1)1x(2)1x(m)].

We say that the design matrix (X) in Equation (1) has m training examples and 1 feature, x.

The logistic regression model shall be trained on X. For those who are not familiar with logistic regression model can study the model in Machine Learning Course by Andrew Ng in Week 4. The logistic regression model has a model as follows:

hθ(x(i))=11+eθ0θ1x(i).

Furthermore, logistic regression model has a cost function J(θ),

J(θ)=1mmi=1Cost(hθ(x(i)),y(i))

with

Cost(hθ(x(i)),y(i))=y(i)log(hθ(x(i)))(1y(i))log(1hθ(x(i)))

and x(i) is an ith training example, and y(i) is label or class from a training example x(i).
This article attempts to explain how to calculate partial derivatives from logistic regression cost function on θ0 and θ1. This partial derivatives are also called gradient, Jθ.

The Complete Form of Logistic Regression Cost Function

By combining Equation (3) and (4), a more detailed cost function can be obtained as follows:

J(θ)=1mmi=1Cost(hθ(x(i)),y(i))=1mmi=1(y(i)log(hθ(x(i)))(1y(i))log(1hθ(x(i))))=1mmi=1(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))).

Next, Jθ0 and Jθ1 shall be computed. Now we compute the partial derivative of hθ(x) on θ0 or hθθ0.
From Calculus, the derivatives of u(x)v(x) with each u(x) dan v(x) is a function of x are

uvuvv2

with u and v are the first derivatives of u and v, respectively.
We shall utilize this formula in Equation (6) to calculate hθθ0 and hθθ1 as follows:

hθθ0=0+eθ0θ1x(1+eθ0θ1x)2=eθ0θ1x(1+eθ0θ1x)2=(11+eθ0θ1x)(eθ0θ1x1+eθ0θ1x)=(11+eθ0θ1x)(111+eθ0θ1x)=hθ(x)(1hθ(x))

and

hθθ1=0+eθ0θ1xx(1+eθ0θ1x)2=eθ0θ1xx(1+eθ0θ1x)2=(11+eθ0θ1x)(eθ0θ1x1+eθ0θ1x)x=(11+eθ0θ1x)(111+eθ0θ1x)x=hθ(x)(1hθ(x))x.

Calculate Jθ0

The partial derivative Jθ0 is calculated as follows:

Jθ0=θ0(1mmi=1(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))))=1mθ0(mi=1(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))))=1mmi=1(y(i)θ0(log(hθ(x(i))))Part I+(1y(i))θ0(log(1hθ(x(i))))Part II).

Part I from Equation (9) is calculated with chain rules technique and Equation (7) becomes

θ0(log(hθ(x(i))))=hθ(log(hθ(x(i))))hθθ0=1hθ(x(i))hθ(x(i))(1hθ(x(i)))=1hθ(x(i))hθ(x(i))(1hθ(x(i)))=(1hθ(x(i))).

Part II from Equation (9) is also calculated with the chain rules and Equation (7) becomes

θ0(log(1hθ(x(i))))=hθ(log(1hθ(x(i))))hθθ0=11hθ(x(i))hθ(x(i))(1hθ(x(i)))=11hθ(x(i))(1hθ(x(i)))hθ(x(i))=11hθ(x(i))(1hθ(x(i)))hθ(x(i))=hθ(x(i)).

By substituting Equation (10) and Equation (11) into Equation (9) we obtain

Jθ0=1mmi=1(y(i)θ0(log(hθ(x(i))))Part I+(1y(i))θ0(log(1hθ(x(i))))Part II)=1mmi=1(y(i)(1hθ(x(i)))(1y(i))hθ(x(i)))=1mmi=1(y(i)y(i)hθ(x(i))hθ(x(i))+y(i)hθ(x(i)))=1mmi=1(y(i)y(i)hθ(x(i))hθ(x(i))+y(i)hθ(x(i)))=1mmi=1(y(i)hθ(x(i)))=1mmi=1(hθ(x(i))y(i)).

Calculate Jθ1

The partial derivative Jθ1 can be calculated as follows:

Jθ1=θ1(1mmi=1(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))))=1mθ1(mi=1(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))))=1mmi=1(y(i)θ1(log(hθ(x(i))))Part I+(1y(i))θ1(log(1hθ(x(i))))Part II).

Part I from Persamaan (12) is calculated by chain rules and Equation (8) becomes

θ1(log(hθ(x(i))))=hθ(log(hθ(x(i))))hθθ1=1hθ(x(i))hθ(x(i))(1hθ(x(i)))x(i)=1hθ(x(i))hθ(x(i))(1hθ(x(i)))x(i)=(1hθ(x(i)))x(i).

Part II from Equation (12) is also calculated with chain rules and Equation (8) becomes

θ1(log(1hθ(x(i))))=hθ(log(1hθ(x(i))))hθθ1=11hθ(x(i))hθ(x(i))(1hθ(x(i)))x(i)=11hθ(x(i))(1hθ(x(i)))hθ(x(i))x(i)=11hθ(x(i))(1hθ(x(i)))hθ(x(i))x(i)=hθ(x(i))x(i).

Again, by substituting Equation (13) and Equation (14) into Equation (12), we obtain

Jθ1=1mmi=1(y(i)θ1(log(hθ(x(i))))Bagian I+(1y(i))θ1(log(1hθ(x(i))))Bagian II)=1mmi=1(y(i)(1hθ(x(i)))x(i)(1y(i))hθ(x(i))x(i))=1mmi=1(y(i)y(i)hθ(x(i))hθ(x(i))+y(i)hθ(x(i)))x(i)=1mmi=1(y(i)y(i)hθ(x(i))hθ(x(i))+y(i)hθ(x(i)))x(i)=1mmi=1(y(i)hθ(x(i)))x(i)=1mmi=1(hθ(x(i))y(i))x(i).

Therefore, the gradient of logistic regression model with 1 variable, x and 2 parameters, θ0 dan θ1 is

Jθ0=1mmi=1(hθ(x(i))y(i))

and

Jθ1=1mmi=1(hθ(x(i))y(i))x(i).

In general, gradient of logistic regression model dengan n variables, x1,x2,,xn and n+1 parameters, θ0,θ1,,θn is

Jθj=1mmi=1(hθ(x(i))y(i))x(i)j.

Additionally, in case j=0, we have x(i)0=1 for i=1,2,,m.


Written on April 8, 2019