6 July 2025

Diving into popular CNN architectures

by Hardik Jaiswal

So from the last few weeks I have been deep diving into Deep learning, and I am understanding how Deep learning works practically. For this I decided to go with a popular book - Deep Learning with PyTorch by Eli Stevens, Luca Antiga and Thomas Viehmann

Today I was trying out some popular Convolutional Neural Network architectures - ResNet and CycleGAN. And as we know that deeper networks = better performance. But nope. When you start stacking a ton of layers (like 50, 101, etc.), the gradients (that update your weights) start disappearing during backprop. This makes the model stop learning.

Even activation functions like ReLU help, but not enough.

ResNet - Residual Network

That’s where ResNet walks in with a genius idea called residual learning.

Instead of trying to learn the full mapping 𝐻(𝑥), it learns only the difference:

𝐹(𝑥) = 𝐻(𝑥) − 𝑥 ⟹ 𝐻(𝑥) = 𝐹(𝑥) + 𝑥

This adds the original input x back to the output of a few layers using a skip connection. So in a 101-layer ResNet, data still flows through all the layers but at checkpoints, it jumps ahead and re-enters, keeping the gradient alive and learning stable.

Thus ResNet provides us:

Deep networks become trainable again
Avoids the degradation problem
Skips + additions = stability + speed

CycleGAN: Translate Horses to Zebras Like a Boss

Imagine having two domains (like Horses and Zebras) but without having perfectly matched image pairs (like HorseA ↔ ZebraA). That’s where CycleGAN pulls up with:

2 Generators: G (Horse ➝ Zebra), F (Zebra ➝ Horse)
2 Discriminators: D_A (for Horses), D_B (for Zebras)

The trick? Cycle Consistency.

“If I go from Horse ➝ Zebra ➝ Horse again, I should end up close to the original horse.” This forces the model to preserve content while changing style. It learns to translate rather than randomly generate.

Why does this even matter?

You don’t need paired data
You can train artistic style transfer models (like turning selfies into anime)
It powers real-world stuff: satellite image translation, medical image synthesis, photo → sketch generators, etc.

My Takeaway

Deep learning isn’t magic, it’s smart math with clever tricks. ResNet taught me that learning the difference can be easier than learning the whole thing. CycleGAN showed me that you can translate without knowing exact mappings, just by enforcing logical structure.

tags: Python - Computer Vision - AI-ML