
Formula: max(0,x) Range: (0,inf) Pros: Although RELU looks and acts like a linear function, it is a nonlinear function allowing complex relationships to be learned and is able to allow learning through all the hidden layers in a deep network by having large derivatives. Each small helper function you will implement will have detailed instructions that will walk you through the necessary steps. These helper functions will be used in the next assignment to build a two-layer neural network and an L-layer neural network. This is the default function but modifying default parameters allows us to use non-zero thresholds and to use a non-zero multiple of the input for values below the threshold (called Leaky ReLU). To build your neural network, you will be implementing several 'helper functions'. If we apply it then they will be linearly dependent.ĭescription: The rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. Range: (0,1), sum of output = 1 Pros: Can handle multiple classes and give the probability of belonging to each class Cons: Should not be used in hidden layers as we want the neurons to be independent. Also, sigmoid outputs are not zero-centered (it is centred around 0.5) which leads to undesirable zig-zagging dynamics in the gradient updates for the weights Plot: Cons: The gradient values are significant for range -3 and 3 but become much closer to zero beyond this range which almost kills the impact of the neuron on the final output. Large negative numbers become 0 and large positive numbers become 1 Formula: 1 /(1 + e^-x) Range: (0,1) Pros: As it’s range is between 0 and 1, it is ideal for situations where we need to predict the probability of an event as an output. The activation function of a neuron defines it’s output given its inputs.We will be talking about 4 popular activation functions:ĭescription: Takes a real-valued number and scales it between 0 and 1. So without any further delay let’s dive in! Activation Functions In case you need a refresher on how neural networks work or what is a activation or loss function, please refer to this blog. We will be going through the key features of popular Activation Functions and Loss Functions as well as understand when should one use which. Such questions will keep coming till we do not have a firm understanding of what each option does, it’s pros and cons and when should one use it. When one starts to develop their own neural networks, it is easy to get overwhelmed by the wide variety of options available for each parameter of the model.Which activation function to use for each hidden layer? Which activation function for the output layer? When to use Binary Cross Entropy vs Categorical Cross Entropy?
