Posts

Showing posts from December, 2022

Generative Adversarial Networks: A Two-player game

Image
Introduced in 2014 by Goodfellow. et al , Generative Adversarial Networks (GANs) revolutionized the field of Generative modeling. They proposed a new framework that generated very realistic synthetic data trained through a minimax two-player game. With GANs, we don't explicitly learn the distribution of the data $p_{\text{data}}$, but we can still sample from it. Like VAEs, GANs also have two networks: a Generator and a Discriminator that are trained simultaneously. A latent variable is sampled from a prior $\mathbf{z} \sim p(\mathbf{z})$ and passed through the Generator to obtain a fake sample $\mathbf{x} = G(\mathbf{z})$. Then the discriminator performs an image classification task that takes in all samples and classifies it as real ($1$) or fake ($0$). These are trained together such that while competing with each other, they make each other stronger at the same time. To summarize, A discriminator $D$ estimates the probability of a given sample coming from the

Generative Modeling with Variational Autoencoders

Image
Till now, I have talked about supervised learning where our model is trained to learn a mapping function of data $\mathbf{x}$ to predict labels $\mathbf{y}$, for example, the tasks of classification, object detection, segmentation, image captioning, etc. However, supervised learning requires large datasets that are created with human annotations to train the models. The other side of machine learning is called unsupervised learning where we just have data $\mathbf{x}$, and the goal is to learn some hidden underlying structure of the data using a model. Through this approach, we aim to learn the distribution of the data $p(\mathbf{x})$ and then sample from it to generate new data, in our case images. Autoencoders An autoencoder is a bottleneck structure that extracts features $\mathbf{z}$ from the input images $\mathbf{x}$ and uses them to reconstruct the original data $\mathbf{x}'$, learning an identity mapping. Encoder: network that compresses the input into a latent-s