Loss functions in Deep Learning
While there are a shit ton of concepts related to Deep learning scrambled all around the internet, I thought why not have just one place where one can find all the fundamental concepts needed to set up their own Deep Neural Network (DNN) architecture? This series can be viewed as a reference guide that you can come back to and look at to brush up on everything. In this first part, I will discuss one of the most essential elements of deep learning - the loss function! I call it the "Oxygen of Deep Learning" because, without a loss function, a neural network cannot be trained (so it would just be dead).
A loss function also called an objective function or a cost function, shows us "how" bad our neural network's predictions are or quantify our unhappiness with scores (another word for predictions) across the training data. So lower the loss, the better our model is. An abstract formulation can be (on image classification task) as follows - Given an image
The scores of the true class are in bold. The class with the maximum score is the predicted label (a class to which the image belongs according to our model). Based on these scores, a loss is computed for each of the input image. The final loss is the average of the losses of all input images. We talk about two of the most commonly used loss functions in deep learning - SVM loss and Softmax loss.
SVM Loss
The intuition behind Support Vector Machine (SVM) loss is that the score of the correct class should be higher than all the other scores by some threshold value. This seems reasonable as we would want our classifier to assign a high score for the right category and low scores for all the other wrong categories. The SVM loss has the form,
The SVM loss is also called Hinge Loss because of its shape like a door hinge. If the score of the correct class is greater than all the other scores plus a margin, the loss is zero. Otherwise, the loss increases linearly.
Let's derive the SVM loss for our example. For the cat image, the true score
Cross-Entropy Loss
Cross-Entropy loss or Logistic Loss interprets the raw classifier scores as probabilities. We take the raw scores and run them through the exponential function. This makes sure that all the probabilities are positive. We then normalize these probabilites to obtain a distribution over categories. This transformation is called the softmax function. Lets do this on our first cat image example.
The cross-entropy between a "true" distribution
While these are the two most commonly used loss functions, a complete list of all loss functions can be found here - Loss Functions
Comments
Post a Comment