PART 2 : WRITING A DYNAMIC NEURAL NETWORK FRAMEWORK FROM SCRATCH IN PYTHON

Angad Sandhu
5 min readDec 23, 2020

Now, we will start implementing our Neural Network Framework, by jumping right into python code!!!

If you have not read the Part 1 of this series, I would suggest you to please have a look, here :

Creating our Class Constructor

we will initialize our code,

  • taking input matrix
  • saving learning rate
  • number of layers of our Neural Network (default=2)
  • taking our output matrix
  • initializing our loss

Here, I have decided to randomly choose the number of nodes in all the hidden layer (excluding input and output layer) using randint(), where the minimum and maximum possible number of nodes are 2 and 10 resp.

These values are saved in the num_nodes array, after this we also add the shape of the input and output layers (i.e. 1) to the array at the first and last position

Also, using the number of layers and number of nodes in each layer, we will define the shapes of our Weights and Biases, where they will be initialized to random values (between 0 and 1).

Because I can’t just dynamically initialize the weights and biases, I am directly creating strings that I edit dynamically and then I execute them using the inbuilt exec() function in python.

Training Function

This function will be called when we want to (as the name suggests) train our Neural Network. In this function we will loop through all our input values, execute the forward prop and save the temporary variable of the last layer as our output and save it into the global variable of y_hat.

Using the above mentioned output (ŷ) and our actual output (y). We will feed these values into the backward prop to update our Weights and Biases.

Finally, we will calculate our Loss (or the accuracy) of our output sung the NN_loss function.

Feed Forward Step

The first step will be to compute our forward propagation. But before that we will discuss the Sigmoid helper function a.k.a The Activation Function, this can be replaced by a Relu or tanh, but we will be using sigmoid here.

The Sigmoid is a squashing function that converts any value in the range of : [-∞, ∞] to a value of [0, 1]. This is done element-wise to the matrix that will be inputted to this function.

Now, we come to the actual function we wanted to talk about.

Unlike other layers, as the input layer does not have an activation function. It is directly multiplied by the first Weight matrix and added to the first Bias matrix, to get out first hidden layer (Z1).

This “Z” is put into the Activation Function we talked about earlier to squash the values we received.

The same process is then carried out in a loop, starting from the second hidden layer. Just this time the “A” matrix from the last calculation is multiplied with “W”

finally, the value of the last layer is returned to the train() function as output.

Back Prop Step

Just like the feedforward step we will discuss the helper function that helps us to make the backprop calculations, here instead of the Sigmoid we will use the derivative of the Sigmoid function.

It is formed by multiplying the sigmoid function with the value of 1 minus itself. Its execution in code is :

Now, we come to the backprop part where we use the chain rule to find out the change in the Loss with respect to the Z matrix, this will be the slope of “Z” w.r.t. “J” and we will represent this as “dZ”.

This value of dZ is further used to ding the change in “W” and “b” w.r.t. to “J”, so instead of calculating the change all the way back to J, we and directly use “dZ”. These slopes of Weights and Biases are represented as “dW” and “db”.

As the last layer is followed by the Loss function, the equation of its derivative is a little uniques. Hence, the value of the dZ of the last layer is calculated separately, whereas all the other dZ are calculated within a loop.

After calculating all the dZ, we move on to updating the values of our weights using dW. To get dW we multiply dZ with A of the last layer, then when we have dW we multiply it by our learning rate (alpha) and subtract it from our earlier W to get the new W.

The calculation of the last dW is done differently as we don't have an A0 (i.e. activation in the first input layer)

Similarly, for biases we directly multiply the dZ value by the alpha and subtract it from the old bias to get the new bias, this is done in a loop.

Calculation of Loss

Now that we have propagated through all our calculations, we will compute the accuracy of the prediction of this model. In the first few iterations it is expected that the accuracy will be low (i.e. loss will be high), and as we go on to train this network, we will see the error of our calculations decrease.

We calculate the loss for each example in our model and add it to the global loss variable, at the end of each iteration this total loss will be divided by the number of examples to get the average loss per example in our sample.

Resetting Loss

At the end of each iteration as we are starting from the beginning, we have to reset our loss to 0, otherwise it will keep on increasing as we iterate again and again.

Predicting Values

Finally, now that we have a fully trained model, we will use our trained weights and biases to get the results from our input and see how well our model performs on ourselves.

We will directly set our user given input as the first layer of our model (i.e. z0) and perform the feedforward method and return the last layer.

Next Steps

Now we will write our diver code, input/output values and more points on how we can improve our Framework.

You can go to the next part of this series using this link :

--

--

Angad Sandhu

Data Science | AI Developer | Full Stack Developer. I Build Things.