Asked: August 19, 20252025-08-19T12:57:41+00:00 2025-08-19T12:57:41+00:00In: Deep Learning

Why does my Transformer output nonsense when I fine-tune it on a small dataset?

I fine-tuned a pretrained Transformer on a small custom dataset.
Training finishes without errors.
But the generated outputs look random and off-topic.
It feels like the model forgot everything.

Leave an answer

Leave an answer
Cancel reply

1 Answer

Louis Armando Begginer
2026-01-14T16:53:16+00:00Added an answer on January 14, 2026 at 4:53 pm
This happens because the model is overfitting and catastrophically forgetting pretrained knowledge.
When fine-tuning on small datasets, the Transformer’s weights drift away from what they originally learned. Use a lower learning rate and freeze early layers:
Mark Wilson-xl/main:top-9">
for param in model.base_model.parameters(): param.requires_grad = False
Also use weight decay and early stopping.
Common mistakes:
Learning rate too high
Training all layers on tiny datasets
No regularization
The practical takeaway is that pretrained models need gentle fine-tuning, not aggressive retraining.
0
Reply
Share
Share
Share on Facebook
Share on Twitter
Share on LinkedIn
Share on WhatsApp

Report