I trained an LSTM for next-word prediction on text data.
The training loss decreases normally.
But when I generate text, it repeats the same token again and again.
It feels like the model is ignoring the sentence.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
This happens because the model learned a shortcut by always predicting the most frequent word in the dataset.
If padding tokens or common words dominate the loss, the LSTM can minimize error by always outputting the same token. This usually means your loss function is not ignoring padding or your data is heavily imbalanced.
Make sure your loss ignores padding tokens:
nn.CrossEntropyLoss(ignore_index=pad_token_id)
Also check that during inference you feed the model its own predictions instead of ground-truth tokens.
Using temperature sampling during decoding also helps avoid collapse:
probs = torch.softmax(logits / 1.2, dim=-1)
Common mistakes:
Including
<PAD>in lossUsing greedy decoding
Training on repetitive text
The practical takeaway is that repetition is a training signal problem, not an LSTM architecture problem.