Build·6 min read·May 20, 2026

How I Built a Smart Store Computer Vision System

On-shelf inventory counting with YOLO and OpenCV — model training, edge deployment, and what labelling quality actually means in practice.

Manual stock counting in a retail or warehouse environment is expensive, inconsistent, and usually too infrequent to be operationally useful. A weekly physical count tells you what was on the shelf seven days ago. The Smart Store Computer Vision system was built to provide near-real-time shelf visibility — counting stock continuously using cameras already mounted in the space, without any manual intervention.

The system uses object detection to identify and count individual SKUs on shelves. A camera feeds a video stream into an edge-deployed detection model, which produces a count per SKU per shelf location on a configurable cadence. Count discrepancies against the expected planogram trigger an alert to the store operations team. The output integrates with the inventory management system to update on-hand positions without manual entry.

Architecture

The detection model is YOLOv8, chosen for its inference speed on edge hardware. The preprocessing pipeline uses OpenCV for frame extraction, normalisation, and region-of-interest cropping — the model only sees the shelf area, not the full camera frame, which reduces compute load and improves detection accuracy in complex scenes. The edge deployment runs on an NVIDIA Jetson device, keeping the inference local rather than sending video to a central server, which reduces latency and eliminates the bandwidth requirement for continuous video streaming.

Count aggregation runs on a lightweight Python service on the same device. When a discrepancy exceeds the configured threshold, the service pushes an alert payload to a central API, which writes to the inventory system and triggers a notification to the operations dashboard. The dashboard is a simple React application showing current counts by shelf location with a colour-coded status view.

The Labelling Lesson

The most significant lesson from this build was about labelling quality. The initial dataset was assembled quickly — images from a mix of store environments with annotations done at speed. The model trained on this dataset produced poor results, not because YOLO is incapable, but because the labels were inconsistent. The same product photographed in different lighting conditions had been annotated differently by different labellers. Occlusions were handled inconsistently. The class definitions were ambiguous for products with similar packaging.

After rebuilding the dataset with tighter annotation guidelines — single labeller, explicit handling rules for occlusion and edge cases, consistent class definitions per SKU family — model performance improved substantially without any change to the architecture or training configuration. The lesson is not subtle: in computer vision, the dataset is the product. Everything else is implementation.

Capabilities and Outcomes

The deployed system counts stock continuously during operating hours, provides location-level visibility by shelf bay, and integrates with existing inventory systems via API. Alert thresholds are configurable per SKU category. The system reduced the frequency of manual cycle counts from daily to weekly in the pilot deployment, with the automated system handling the intra-day visibility that manual counting could not provide. Shrinkage identification improved because discrepancies are flagged in near-real-time rather than discovered at the next manual count.

All articles Get in touch →