Hendra Bunyamin

Forgiven sinner and Lecturer at Maranatha Christian University

Compute the Gradient of Cost Function from Logistic Regression

In linear regression with one variable, we have a design matrix ($X$) that represents our dataset and specifically has a shape as follows:

We say that the design matrix ($X$) in Equation \eqref{eq:x-dataset} has $m$ training examples and 1 feature, $x$.

The logistic regression model shall be trained on $X$. For those who are not familiar with logistic regression model can study the model in Machine Learning Course by Andrew Ng in Week 4. The logistic regression model has a model as follows:

Furthermore, logistic regression model has a cost function $J(\theta)$,

with

and $x^{(i)}$ is an $i$th training example, and $y^{(i)}$ is label or class from a training example $x^{(i)}$.
This article attempts to explain how to calculate partial derivatives from logistic regression cost function on $\theta_0$ and $\theta_1$. This partial derivatives are also called gradient, $\frac{\partial J}{\partial \theta}$.

The Complete Form of Logistic Regression Cost Function

By combining Equation \eqref{eq:cost-function} and \eqref{eq:cost-logistic}, a more detailed cost function can be obtained as follows:

Next, $\frac{\partial J}{\partial \theta_0}$ and $\frac{\partial J}{\partial \theta_1}$ shall be computed. Now we compute the partial derivative of $h_{\theta}(x)$ on $\theta_0$ or $\frac{\partial h_{\theta}}{ \partial \theta_0 }$.
From Calculus, the derivatives of $\frac{u(x)}{v(x)}$ with each $u(x)$ dan $v(x)$ is a function of $x$ are

with $u^{\prime}$ and $v^{\prime}$ are the first derivatives of $u$ and $v$, respectively.
We shall utilize this formula in Equation \eqref{eq:formula-derivatif} to calculate $\frac{\partial h_{\theta}}{ \partial \theta_0 }$ and $\frac{\partial h_{\theta}}{ \partial \theta_1 }$ as follows:

and



Calculate $\frac{\partial J}{\partial \theta_0}$

The partial derivative $\frac{\partial J}{\partial \theta_0}$ is calculated as follows:

Part I from Equation \eqref{eq:bagian2-theta0} is calculated with chain rules technique and Equation \eqref{eq:formula-derivatif-theta0} becomes

Part II from Equation \eqref{eq:bagian2-theta0} is also calculated with the chain rules and Equation \eqref{eq:formula-derivatif-theta0} becomes

By substituting Equation \eqref{eq:bagian-I-theta0} and Equation \eqref{eq:bagian-II-theta0} into Equation \eqref{eq:bagian2-theta0} we obtain



Calculate $\frac{\partial J}{\partial \theta_1}$

The partial derivative $\frac{\partial J}{\partial \theta_1}$ can be calculated as follows:

Part I from Persamaan \eqref{eq:bagian2-theta1} is calculated by chain rules and Equation \eqref{eq:formula-derivatif-theta1} becomes

Part II from Equation \eqref{eq:bagian2-theta1} is also calculated with chain rules and Equation \eqref{eq:formula-derivatif-theta1} becomes

Again, by substituting Equation \eqref{eq:bagian-I-theta1} and Equation \eqref{eq:bagian-II-theta1} into Equation \eqref{eq:bagian2-theta1}, we obtain

Therefore, the gradient of logistic regression model with 1 variable, $x$ and 2 parameters, $\theta_0$ dan $\theta_1$ is

and

In general, gradient of logistic regression model dengan $n$ variables, $x_1, x_2, \ldots, x_n$ and $n+1$ parameters, $\theta_0, \theta_1, \ldots, \theta_n$ is

\begin{equation} \frac{\partial J}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^{m} ( h_{\theta}(x^{(i)}) - y^{(i)} ) x_{j}^{(i)}. \end{equation}

Additionally, in case $j=0$, we have $x_0^{(i)} = 1$ for $i = 1, 2, \ldots, m$.


Written on April 8, 2019