Changes ripple through automation. Hidden dependencies exist. Testing catches regressions.Takeaway:…

Question

Asked: December 14, 20252025-12-14T13:13:12+00:00 2025-12-14T13:13:12+00:00In: Deep Learning

Why does my Transformer’s training loss decrease but translation quality stays poor?

The training loss drops steadily during fine-tuning.
But the translated sentences are grammatically wrong.
BLEU and other quality metrics do not improve.
It feels like the model is optimizing the wrong thing.

Leave an answer

Leave an answer
Cancel reply

1 Answer

Louis Armando · Answer 1 · 2026-01-14T16:46:24+00:00

This happens because token-level loss does not capture sentence-level quality. Transformers are trained to predict the next token, not to produce coherent or accurate full sequences. A model can become very good at predicting individual words while still producing poor translations.

Loss measures how well each token matches the reference, but translation quality depends on word order, fluency, and semantic correctness across the entire sequence. These properties are not directly optimized by standard cross-entropy loss.

Using better decoding strategies such as beam search, label smoothing, and sequence-level evaluation helps align training with actual quality. In some setups, reinforcement learning or minimum-risk training is used to optimize sequence metrics directly.

Why does zero-trust adoption face internal resistance?

Why do Salesforce error messages feel vague or unhelpful?

Why does my API leak internal details through error messages?

Akshay Kumar

Aaditya Singh

Abhimanyu Singh

Sign Up

Sign In

Forgot Password

Decode Trail Latest Questions

Why does my Transformer’s training loss decrease but translation quality stays poor?

Related Questions

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply