For years, progress in computer vision was driven almost exclusively by technological advances: ever-larger models, more powerful GPUs, and increasingly sophisticated algorithms. However, as these capabilities have matured, it has become clear that the real limitation is no longer the model itself, but the data used to train it.
In retail—and especially in supermarkets—this reality is even more evident. A computer vision system can only correctly recognize what it has seen before. If the dataset does not faithfully reflect the real in-store environment, the result is an AI that performs well under controlled conditions but fails when confronted with the real world. Shadows, transparent bags, reflective trays, changing lighting, fruit at different stages of ripeness, or damaged packaging are all part of everyday supermarket operations and must be represented in the data if a truly robust solution is to be built.
Today, the main bottleneck in computer vision is no longer technology, but the quality, diversity, and representativeness of the dataset.
A Good Model Cannot Compensate for a Poor Dataset
In computer vision, a model’s ability to generalize depends directly on the data it has seen during training. No matter how advanced the algorithm is, if the dataset is limited or unrealistic, the model will make recurring mistakes in production.
This translates into frequent failures in common in-store situations, such as partially occluded products, shiny packaging, fruit that is overripe or underripe, or variations in shape and size among units of the same product. When the data does not represent the world as it really is—with all its imperfections—AI becomes unreliable precisely when it is needed most.
The Extreme Variability of the Supermarket
One of the greatest challenges in retail is the enormous visual variability of the environment. The same product can look radically different depending on the store, the time of day, or even the season of the year. Lighting varies across locations and throughout the day, fixtures introduce different backgrounds, cameras age or are replaced, and the products themselves vary in color, size, and texture.
A kilo of tomatoes does not look the same in winter as it does in summer, and an apple under warm light is perceived very differently from one under cold lighting. If this variability is not captured in the dataset, it will inevitably appear as errors during inference. That is why building rich and diverse datasets is not a luxury, but a necessity.
Annotation: A Critical Detail with Major Impact
Dataset quality does not depend solely on the images themselves, but also on how they are annotated. Small labeling errors can have a disproportionate impact on a model’s final performance.
Incorrect labels, overly generic categories, imprecise bounding, or inconsistencies between annotators can introduce noise that is amplified during training. In fresh produce, where very similar varieties exist and visual differences are subtle, poor annotation can lead to systematic confusion. It is not uncommon for a small percentage of annotation errors to result in a much larger drop in the final model’s accuracy.
Without rigorous and consistent annotation, reliable computer vision simply does not exist.
The Problem with “Pretty” Datasets
Many traditional datasets consist of clean, well-lit images without obstructions. They may look appealing, but they are not representative of real supermarket conditions. Yet the cases that actually generate errors and fraud are not the ideal ones, but the difficult ones.
Reflective trays, transparent bags, deformed products, cut fruit, hands partially covering items, or harsh shadows are everyday situations in stores. A robust dataset must deliberately include these problematic scenarios. If models are trained only on “perfect” images, the AI will be fragile precisely where it should deliver the most value.
Data Drift: When the Dataset Ages
The supermarket environment is not static, and datasets should not be either. New suppliers, packaging changes, store remodels, hardware replacements, or even new customer habits gradually alter the data distribution.
When real-world data begins to diverge from the data used to train the model, the phenomenon known as data drift emerges. If it is not detected and corrected in time, system performance degrades silently. That is why keeping a dataset alive and up to date is just as important as maintaining the model itself.
Conclusion
Computer vision does not fail because models are bad, but because datasets do not always represent the complexity of the real supermarket environment. Today, dataset quality is the factor that makes the difference between an AI that works well in a demo and an AI that performs consistently in-store.
Advanced models, powerful hardware, and edge computing are all essential pieces, but none can replace data that is diverse, realistic, well annotated, up to date, and rich in challenging cases. The true bottleneck in computer vision is no longer the algorithm, but the data.
Retailers who understand this reality will be the ones to deploy reliable, robust, and scalable AI solutions—capable of succeeding in the most demanding environment of all: the real supermarket.


