Posts

Showing posts from September, 2022

Image Classification using CNNs: MNIST dataset

Image
Image classification is a fundamental task in computer vision that attempts to comprehend an entire image as a whole. The goal is to classify the image by assigning it to a specific class label. Typically, image classification refers to images in which only one object appears and is analyzed. Now that we have all the ingredients required to code up our deep learning architecture, let's dive right into creating a model that can classify handwritten digits (MNIST Dataset) using Convolutional Neural Networks from scratch. The MNIST dataset consists of 70,000 $28 \times 28$ black-and-white images of handwritten digits. I will be using Pytorch for this implementation. Don't forget to change the runtime to GPU to get accelerated processing! The first step is to import the relevant libraries that we will be using throughout our code. import torch from torch import nn import torchvision import matplotlib.pyplot as plt Downloading and Pre-processing the Dataset The dataset

Understanding Batch Normalization

Image
In the previous post, we talked about Convolution and Pooling layers. Stacking a large number of these layers (CNNs with activation functions and pooling) results in a Deep CNN architecture, which is often hard to train. It becomes very difficult to converge once they become very deep. The most common solution to this problem is Batch Normalization . The idea is to normalize the outputs of a layer so that they have zero mean and unit variance. If you ask why?. The distribution of the inputs to layers deep in the network may change after each mini-batch when the weights are updated. This can cause the learning algorithm to forever chase a moving target. This change in the distribution of inputs to layers in the network is referred to as the technical name " internal covariate shift " and batch normalization helps to reduce this internal covariance shift, improving optimization. We introduce batch normalization as a layer in our network that takes in inputs and normalizes

Neural Network for Images: Convolutional Neural Networks (CNNs)

Image
Linear Neural Networks that we have talked about till this point do not work when dealing with image data, as they don't respect the spatial structure of images. When a neural network is applied to a 2D image, it flattens it out in 1D and then uses it as input. This creates a need for a new computational node that operates on images - Convolutional Neural Networks (CNNs). A convolution layer has a 3D image tensor (3 X H X W) as an input and a 3D filter (also called a kernel) that convolves over the image, i.e. slides over the image spatially, computing dot products. It should be noted that since it a 2D operation, the number of depth channels (here its 3 RGB) always has to match the number of depth channels in the filter. The figure below shows an example of the convolution operation. The values in the filter are the weights that are learned from training, and the same filter gets applied over all the positions of the image. Here is what it would look like in 3D.