I fine-tuned a pretrained Transformer on a small custom dataset.
Training finishes without errors.
But the generated outputs look random and off-topic.
It feels like the model forgot everything.
Why does my Transformer output nonsense when I fine-tune it on a small dataset?
Anushrita GhoshBegginer
This happens because the model is overfitting and catastrophically forgetting pretrained knowledge.
When fine-tuning on small datasets, the Transformer’s weights drift away from what they originally learned. Use a lower learning rate and freeze early layers:
for param in model.base_model.parameters():
param.requires_grad = False
Also use weight decay and early stopping.
Common mistakes:
Learning rate too high
Training all layers on tiny datasets
No regularization
The practical takeaway is that pretrained models need gentle fine-tuning, not aggressive retraining.