Logo

Hardik Jaiswal

Kolkata, India hardikk.jaiswal@gmail.com GitHub

6 July 2025

Diving into popular CNN architectures

by Hardik Jaiswal

So from the last few weeks I have been deep diving into Deep learning, and I am understanding how Deep learning works practically. For this I decided to go with a popular book - Deep Learning with PyTorch by Eli Stevens, Luca Antiga and Thomas Viehmann

Today I was trying out some popular Convolutional Neural Network architectures - ResNet and CycleGAN. And as we know that deeper networks = better performance. But nope. When you start stacking a ton of layers (like 50, 101, etc.), the gradients (that update your weights) start disappearing during backprop. This makes the model stop learning.

Even activation functions like ReLU help, but not enough.


ResNet - Residual Network

Thatโ€™s where ResNet walks in with a genius idea called residual learning.

Instead of trying to learn the full mapping ๐ป(๐‘ฅ), it learns only the difference:

๐น(๐‘ฅ) = ๐ป(๐‘ฅ) โˆ’ ๐‘ฅ โŸน ๐ป(๐‘ฅ) = ๐น(๐‘ฅ) + ๐‘ฅ

This adds the original input x back to the output of a few layers using a skip connection. So in a 101-layer ResNet, data still flows through all the layers but at checkpoints, it jumps ahead and re-enters, keeping the gradient alive and learning stable.

Thus ResNet provides us:


CycleGAN: Translate Horses to Zebras Like a Boss

Imagine having two domains (like Horses and Zebras) but without having perfectly matched image pairs (like HorseA โ†” ZebraA). Thatโ€™s where CycleGAN pulls up with:

The trick? Cycle Consistency.

โ€œIf I go from Horse โž Zebra โž Horse again, I should end up close to the original horse.โ€ This forces the model to preserve content while changing style. It learns to translate rather than randomly generate.

Why does this even matter?


My Takeaway

Deep learning isnโ€™t magic, itโ€™s smart math with clever tricks. ResNet taught me that learning the difference can be easier than learning the whole thing. CycleGAN showed me that you can translate without knowing exact mappings, just by enforcing logical structure.

tags: Python - Computer Vision - AI-ML