This article explains **forward propagation** and **backpropagation** for one data instance in one round.

Suppose that we have an artificial neural network (ANN) for doing *binary classification* which consists of one hidden layer. Particularly, the hidden layer has 3 nodes or units with each has a *sigmoid activation function* ($\sigma$) as shown in the image below:

Concretely, the input and hidden layer also have a *bias unit* respectively (not shown in the image). We stick to Andrew’s notation from his excellent Machine Learning Coursera. Let’s define our data instance

Our ANN also has the true label, $y$ and two weight matrices, $\Theta^{(1)}$ and $\Theta^{(2)}$ specified as follows:

Now firstly, let us compute for forward propagation. Here is the computation for the hidden layer.

Now we compute the output layer as follows:

We have finished our **forward propagation**. Basically, forward propagation computes our model’s prediction. Now let us compute the backpropagation steps. Backpropagation algorithm calculates gradients; subsequently, the gradients are needed to update our weight matrices. Consequently, the updates should make our model’s current prediction better than the previous one.

Here is the backpropagation steps.

We have completed our **backpropagation algorithm**.

Finally we can update our weight matrices as follows: