Home/Deep Learning/Page 2
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Why does my classifier become unstable after fine-tuning on new data?
This happens because of catastrophic forgetting. When fine-tuned on new data, neural networks overwrite weights that were important for earlier knowledge. Without constraints, gradient updates push the model to fit the new data at the cost of old patterns. This is especially common when the new dataRead more
This happens because of catastrophic forgetting. When fine-tuned on new data, neural networks overwrite weights that were important for earlier knowledge.
Without constraints, gradient updates push the model to fit the new data at the cost of old patterns. This is especially common when the new dataset is small or biased.
Using lower learning rates, freezing early layers, or mixing old and new data during training reduces this problem.
See lessWhy does my training crash when I increase sequence length in Transformers?
This happens because Transformer memory grows quadratically with sequence length. Attention layers store interactions between all token pairs. Long sequences rapidly exceed GPU memory, even if batch size stays the same. The practical takeaway is that Transformers are limited by attention scaling, noRead more
This happens because Transformer memory grows quadratically with sequence length. Attention layers store interactions between all token pairs.
Long sequences rapidly exceed GPU memory, even if batch size stays the same.
The practical takeaway is that Transformers are limited by attention scaling, not just model size.
See lessWhy does my deep learning model train fine but fail completely after I load it for inference?
This happens because the preprocessing used during inference does not match the preprocessing used during training. Neural networks learn patterns in the numerical space they were trained on. If you normalize, tokenize, or scale data during training but skip or change it when running inference, theRead more
This happens because the preprocessing used during inference does not match the preprocessing used during training.
Neural networks learn patterns in the numerical space they were trained on. If you normalize, tokenize, or scale data during training but skip or change it when running inference, the model sees completely unfamiliar values and produces garbage outputs.
You must save and reuse the exact same preprocessing objects — scalers, tokenizers, and transforms — along with the model. For example, in Keras:
joblib.dump(scaler, "scaler.pkl")
...
scaler = joblib.load("scaler.pkl")
X = scaler.transform(X)
The same applies to image transforms and text tokenizers. Even a small difference like missing standardization will break predictions.
See lessWhy does my language model generate repetitive loops?
This happens when decoding is too greedy and the probability distribution collapses. The model finds one safe high-probability phrase and keeps choosing it. Using temperature scaling, top-k or nucleus sampling introduces controlled randomness so the model explores alternative paths. Common mistakes:Read more
This happens when decoding is too greedy and the probability distribution collapses. The model finds one safe high-probability phrase and keeps choosing it.
Using temperature scaling, top-k or nucleus sampling introduces controlled randomness so the model explores alternative paths.
Common mistakes:
Using greedy decoding
No sampling strategy
Overconfident probability outputs
The practical takeaway is that generation quality depends heavily on decoding strategy.
See lessWhy does my CNN fail on rotated images?
This happens because CNNs are not rotation invariant by default. They learn orientation-dependent features unless trained otherwise. Including rotated samples during training forces the network to learn rotation-invariant representations. Common mistakes: No geometric augmentation Assuming CNNs handRead more
This happens because CNNs are not rotation invariant by default. They learn orientation-dependent features unless trained otherwise.
Including rotated samples during training forces the network to learn rotation-invariant representations.
Common mistakes:
No geometric augmentation
Assuming CNNs handle rotations
The practical takeaway is that invariance must be learned from data.
See lessWhy does my chatbot answer confidently even when it is wrong?
This happens because language models are trained to produce likely text, not to measure truth or confidence. They generate what sounds plausible based on training patterns. Since the model does not have a built-in uncertainty estimate, it always outputs the most probable sequence, even when that proRead more
This happens because language models are trained to produce likely text, not to measure truth or confidence. They generate what sounds plausible based on training patterns.
Since the model does not have a built-in uncertainty estimate, it always outputs the most probable sequence, even when that probability is low. This makes wrong answers sound just as confident as correct ones.
Adding confidence estimation, retrieval-based grounding, or user-visible uncertainty thresholds helps reduce this risk.
See lessWhy does my video recognition model fail when the camera moves?
This happens because the model confuses camera motion with object motion. Without training on moving-camera data, it treats global motion as part of the action. Neural networks do not automatically separate camera movement from object movement. They must be shown examples where these effects differ.Read more
This happens because the model confuses camera motion with object motion. Without training on moving-camera data, it treats global motion as part of the action.
Neural networks do not automatically separate camera movement from object movement. They must be shown examples where these effects differ.
Using optical flow, stabilization, or training with diverse camera motions improves robustness. The practical takeaway is that motion context matters as much as visual content.
See lessWhy does my CNN suddenly start giving NaN loss after a few training steps?
This happens because invalid numerical values are entering the network, usually from broken data or unstable gradients. In CNN pipelines, a single corrupted image, division by zero during normalization, or an aggressive learning rate can inject inf or NaN values into the forward pass. Once that happRead more
This happens because invalid numerical values are entering the network, usually from broken data or unstable gradients.
In CNN pipelines, a single corrupted image, division by zero during normalization, or an aggressive learning rate can inject
inforNaNvalues into the forward pass. Once that happens, every layer after it propagates the corruption and the loss becomes undefined.Start by checking whether any batch contains bad values:
if torch.isnan(images).any() or torch.isinf(images).any():
print("Invalid batch detected")
Make sure images are converted to floats and normalized only once, for example by dividing by 255 or using mean–std normalization. If the data is clean, reduce the learning rate and apply gradient clipping:
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
Mixed-precision training can also cause this, so disable AMP temporarily if you are using it.
See lessWhy does my vision model fail when lighting conditions change?
This happens because your model has learned lighting patterns instead of object features. Neural networks learn whatever statistical signals are most consistent in the training data, and if most images were taken under similar lighting, the network uses brightness and color as shortcuts. When lightiRead more
This happens because your model has learned lighting patterns instead of object features. Neural networks learn whatever statistical signals are most consistent in the training data, and if most images were taken under similar lighting, the network uses brightness and color as shortcuts.
When lighting changes, those shortcuts no longer hold, so the learned representations stop matching what the model expects. This causes predictions to collapse even though the objects themselves have not changed. The network is not failing — it is simply seeing a distribution shift.
The solution is to use aggressive data augmentation, such as brightness, contrast, and color jitter, so the model learns features that are invariant to lighting. This forces the CNN to focus on shapes, edges, and textures instead of raw pixel intensity.
See lessWhy does my autoencoder reconstruct training images well but fails on new ones?
This happens because the autoencoder has overfit the training distribution. Instead of learning general representations, it memorized pixel-level details of the training images, which do not generalize. Autoencoders with too much capacity can easily become identity mappings, especially when trainedRead more
This happens because the autoencoder has overfit the training distribution. Instead of learning general representations, it memorized pixel-level details of the training images, which do not generalize.
Autoencoders with too much capacity can easily become identity mappings, especially when trained on small or uniform datasets. In this case, low loss simply means the network copied what it saw.
Reducing model size, adding noise, or using variational autoencoders forces the model to learn meaningful latent representations instead of memorization.
Common mistakes:
Using too large a bottleneck
No noise or regularization
Training on limited data
The practical takeaway is that low reconstruction loss does not mean useful representations.
See less