Kolkata, India hardikk.jaiswal@gmail.com GitHub
by Hardik Jaiswal
So from the last few weeks I have been deep diving into Deep learning, and I am understanding how Deep learning works practically. For this I decided to go with a popular book - Deep Learning with PyTorch by Eli Stevens, Luca Antiga and Thomas Viehmann
Today I was trying out some popular Convolutional Neural Network architectures - ResNet and CycleGAN. And as we know that deeper networks = better performance. But nope. When you start stacking a ton of layers (like 50, 101, etc.), the gradients (that update your weights) start disappearing during backprop. This makes the model stop learning.
Even activation functions like ReLU help, but not enough.
Thatโs where ResNet walks in with a genius idea called residual learning.
Instead of trying to learn the full mapping ๐ป(๐ฅ), it learns only the difference:
๐น(๐ฅ) = ๐ป(๐ฅ) โ ๐ฅ โน ๐ป(๐ฅ) = ๐น(๐ฅ) + ๐ฅ
This adds the original input x back to the output of a few layers using a skip connection. So in a 101-layer ResNet, data still flows through all the layers but at checkpoints, it jumps ahead and re-enters, keeping the gradient alive and learning stable.
Thus ResNet provides us:
Imagine having two domains (like Horses and Zebras) but without having perfectly matched image pairs (like HorseA โ ZebraA). Thatโs where CycleGAN pulls up with:
The trick? Cycle Consistency.
โIf I go from Horse โ Zebra โ Horse again, I should end up close to the original horse.โ This forces the model to preserve content while changing style. It learns to translate rather than randomly generate.
Deep learning isnโt magic, itโs smart math with clever tricks. ResNet taught me that learning the difference can be easier than learning the whole thing. CycleGAN showed me that you can translate without knowing exact mappings, just by enforcing logical structure.
tags: Python - Computer Vision - AI-ML