In linear regression with one variable, we have a design matrix (X) that represents our dataset and specifically has a shape as follows:
X=[1x(1)1x(2)⋮⋮1x(m)].We say that the design matrix (X) in Equation (1) has m training examples and 1 feature, x.
The logistic regression model shall be trained on X. For those who are not familiar with logistic regression model can study the model in Machine Learning Course by Andrew Ng in Week 4. The logistic regression model has a model as follows:
hθ(x(i))=11+e−θ0−θ1x(i).Furthermore, logistic regression model has a cost function J(θ),
J(θ)=1mm∑i=1Cost(hθ(x(i)),y(i))with
Cost(hθ(x(i)),y(i))=−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))and x(i) is an ith training example, and y(i) is label or class from a training example x(i).
This article attempts to explain how to calculate partial derivatives from logistic regression cost function on θ0 and θ1. This partial derivatives are also called gradient, ∂J∂θ.
The Complete Form of Logistic Regression Cost Function
By combining Equation (3) and (4), a more detailed cost function can be obtained as follows:
J(θ)=1mm∑i=1Cost(hθ(x(i)),y(i))=1mm∑i=1(−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i))))=−1mm∑i=1(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))).Next, ∂J∂θ0 and ∂J∂θ1 shall be computed. Now we compute the partial derivative of hθ(x) on θ0 or ∂hθ∂θ0.
From Calculus, the derivatives of u(x)v(x) with each u(x) dan v(x) is a function of x are
with u′ and v′ are the first derivatives of u and v, respectively.
We shall utilize this formula in Equation (6) to calculate ∂hθ∂θ0 and ∂hθ∂θ1 as follows:
and
∂hθ∂θ1=0+e−θ0−θ1xx(1+e−θ0−θ1x)2=e−θ0−θ1xx(1+e−θ0−θ1x)2=(11+e−θ0−θ1x)(e−θ0−θ1x1+e−θ0−θ1x)x=(11+e−θ0−θ1x)(1−11+e−θ0−θ1x)x=hθ(x)(1−hθ(x))x.
Calculate ∂J∂θ0
The partial derivative ∂J∂θ0 is calculated as follows:
∂J∂θ0=∂∂θ0(−1mm∑i=1(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))))=−1m∂∂θ0(m∑i=1(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))))=−1mm∑i=1(y(i)∂∂θ0(log(hθ(x(i))))⏟Part I+(1−y(i))∂∂θ0(log(1−hθ(x(i))))⏟Part II).Part I from Equation (9) is calculated with chain rules technique and Equation (7) becomes
∂∂θ0(log(hθ(x(i))))=∂∂hθ(log(hθ(x(i))))∂hθ∂θ0=1hθ(x(i))hθ(x(i))(1−hθ(x(i)))=1hθ(x(i))hθ(x(i))(1−hθ(x(i)))=(1−hθ(x(i))).Part II from Equation (9) is also calculated with the chain rules and Equation (7) becomes
∂∂θ0(log(1−hθ(x(i))))=∂∂hθ(log(1−hθ(x(i))))∂hθ∂θ0=−11−hθ(x(i))hθ(x(i))(1−hθ(x(i)))=−11−hθ(x(i))(1−hθ(x(i)))hθ(x(i))=−11−hθ(x(i))(1−hθ(x(i)))hθ(x(i))=−hθ(x(i)).By substituting Equation (10) and Equation (11) into Equation (9) we obtain
∂J∂θ0=−1mm∑i=1(y(i)∂∂θ0(log(hθ(x(i))))⏟Part I+(1−y(i))∂∂θ0(log(1−hθ(x(i))))⏟Part II)=−1mm∑i=1(y(i)(1−hθ(x(i)))−(1−y(i))hθ(x(i)))=−1mm∑i=1(y(i)−y(i)hθ(x(i))−hθ(x(i))+y(i)hθ(x(i)))=−1mm∑i=1(y(i)−y(i)hθ(x(i))−hθ(x(i))+y(i)hθ(x(i)))=−1mm∑i=1(y(i)−hθ(x(i)))=1mm∑i=1(hθ(x(i))−y(i)).
Calculate ∂J∂θ1
The partial derivative ∂J∂θ1 can be calculated as follows:
∂J∂θ1=∂∂θ1(−1mm∑i=1(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))))=−1m∂∂θ1(m∑i=1(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))))=−1mm∑i=1(y(i)∂∂θ1(log(hθ(x(i))))⏟Part I+(1−y(i))∂∂θ1(log(1−hθ(x(i))))⏟Part II).Part I from Persamaan (12) is calculated by chain rules and Equation (8) becomes
∂∂θ1(log(hθ(x(i))))=∂∂hθ(log(hθ(x(i))))∂hθ∂θ1=1hθ(x(i))hθ(x(i))(1−hθ(x(i)))x(i)=1hθ(x(i))hθ(x(i))(1−hθ(x(i)))x(i)=(1−hθ(x(i)))x(i).Part II from Equation (12) is also calculated with chain rules and Equation (8) becomes
∂∂θ1(log(1−hθ(x(i))))=∂∂hθ(log(1−hθ(x(i))))∂hθ∂θ1=−11−hθ(x(i))hθ(x(i))(1−hθ(x(i)))x(i)=−11−hθ(x(i))(1−hθ(x(i)))hθ(x(i))x(i)=−11−hθ(x(i))(1−hθ(x(i)))hθ(x(i))x(i)=−hθ(x(i))x(i).Again, by substituting Equation (13) and Equation (14) into Equation (12), we obtain
∂J∂θ1=−1mm∑i=1(y(i)∂∂θ1(log(hθ(x(i))))⏟Bagian I+(1−y(i))∂∂θ1(log(1−hθ(x(i))))⏟Bagian II)=−1mm∑i=1(y(i)(1−hθ(x(i)))x(i)−(1−y(i))hθ(x(i))x(i))=−1mm∑i=1(y(i)−y(i)hθ(x(i))−hθ(x(i))+y(i)hθ(x(i)))x(i)=−1mm∑i=1(y(i)−y(i)hθ(x(i))−hθ(x(i))+y(i)hθ(x(i)))x(i)=−1mm∑i=1(y(i)−hθ(x(i)))x(i)=1mm∑i=1(hθ(x(i))−y(i))x(i).Therefore, the gradient of logistic regression model with 1 variable, x and 2 parameters, θ0 dan θ1 is
∂J∂θ0=1mm∑i=1(hθ(x(i))−y(i))and
∂J∂θ1=1mm∑i=1(hθ(x(i))−y(i))x(i).In general, gradient of logistic regression model dengan n variables, x1,x2,…,xn and n+1 parameters, θ0,θ1,…,θn is
∂J∂θj=1mm∑i=1(hθ(x(i))−y(i))x(i)j.
Additionally, in case j=0, we have x(i)0=1 for i=1,2,…,m.