I trained an object detection model on a mixed dataset containing people, vehicles, and small objects like phones and traffic signs.
The model detects large objects such as cars and people very reliably.
However, it almost completely ignores smaller objects, even when they are clearly visible.
The training loss looks normal, but the predictions feel biased toward bigger shapes?
This happens because most detection architectures naturally favor large objects due to how feature maps are constructed. In convolutional networks, deeper layers capture high-level features but at the cost of spatial resolution. Small objects can disappear in these layers, making them difficult for the detector to recognize.
If your model uses only high-level feature maps for detection, the network simply does not see enough detail to identify small items. This is why modern detectors use feature pyramids or multi-scale feature maps. Without these, the network cannot learn reliable representations for objects that occupy only a few pixels.
Using architectures with feature pyramid networks (FPN), increasing input resolution, and adding more small-object examples to the training set all improve this behavior. You should also check anchor sizes and ensure they match the scale of objects in your dataset.