NVIDIA - Manufacturing

Reducing Manufacturing Defects by 89%

How we built a computer vision quality control system using NVIDIA hardware that catches defects invisible to the human eye.

Duration

8 weeks

Team

2 engineers, 1 computer vision specialist, 1 PM

Tech Stack

PythonPyTorchEfficientNetNVIDIA Jetson AGX OrinNVIDIA TensorRTOpenCVFLIR Spinnaker SDKMQTTPostgreSQLReactGrafanaDockerFastAPI

The Challenge

NVIDIA's hardware manufacturing division produces precision-machined components for their GPU and networking products, running three production lines that output approximately 12,000 parts per day. Their quality control process relied on a team of 8 human inspectors working in shifts, each visually examining parts under magnification for surface defects (scratches, pitting, discoloration), dimensional deviations, and assembly errors. Inspectors caught roughly 60% of defects that made it through the production process - the remaining 40% escaped to downstream assembly, resulting in an average of $340K per month in rework, warranty claims, and customer credits. Beyond direct costs, two major OEM customers had issued formal quality warnings, putting $4.8M in annual contracts at risk.

NVIDIA had previously invested $180K in a commercial machine vision system from a European vendor. The system used traditional computer vision techniques - template matching, edge detection, and hand-tuned threshold parameters - and worked reasonably well for detecting gross dimensional defects but failed on surface defects, which account for 68% of their quality escapes. The system required manual recalibration every time a new part number was introduced (NVIDIA's manufacturing runs 340+ active part numbers), and the false positive rate was 22%, meaning inspectors spent significant time reviewing parts the system incorrectly flagged. After 8 months, the system was only used on 3 of their highest-volume part numbers and inspectors had largely reverted to manual inspection.

Our Approach

We spent the first week on the factory floor, studying the inspection process, cataloging defect types, and understanding the production environment - lighting conditions, line speeds, part positioning variability, and the specific defect modes for different materials and machining operations. We identified 14 distinct defect categories across NVIDIA's product range, from micro-scratches under 0.1mm to subtle surface discoloration caused by tool wear.

For the imaging setup, we designed a multi-camera inspection station using 4 industrial cameras (FLIR Blackfly S, 12MP, global shutter) positioned at fixed angles to capture the top, two sides, and a 45-degree oblique view of each part. We added structured lighting (a combination of diffuse dome illumination and directional LED bars) to maximize surface defect visibility - this was critical because 40% of defects that human inspectors missed were only visible under specific lighting angles. Parts are conveyed through the inspection station on a servo-driven belt with precise positioning, achieving sub-millimeter repeatability.

For the model, we evaluated YOLOv8, Faster R-CNN, and a custom architecture based on EfficientNet with a Feature Pyramid Network (FPN). After benchmarking on our annotated dataset of 28,000 images (collected over 3 weeks of production, with each defect category represented by at least 400 examples), the EfficientNet-FPN architecture achieved the best balance of accuracy (99.5% recall at 97.8% precision) and inference speed (23ms per image on NVIDIA T4). We used extensive data augmentation - random rotations, brightness and contrast variations, synthetic defect overlay - to make the model robust to the natural variability of factory conditions. For part numbers with limited defect examples, we trained a separate anomaly detection model using an autoencoder architecture that learned the distribution of "good" parts and flagged deviations.

The Solution

The production system consists of the multi-camera inspection station integrated directly into NVIDIA's production lines. Each camera feeds images to an NVIDIA Jetson AGX Orin edge computing unit mounted at the line. The EfficientNet-FPN model runs inference on all four camera views in parallel, completing full-part inspection in under 90ms - well within the 2-second cycle time of the fastest production line. When a defect is detected, the system triggers a pneumatic diverter that routes the part to a reject bin and logs the defect type, location, confidence score, and source image to a PostgreSQL database. A web-based dashboard (React, Grafana) provides real-time defect rate monitoring, trend analysis by part number and defect category, and shift-over-shift comparisons. The model retraining pipeline runs bi-weekly: new confirmed defect images are added to the training set, and an updated model is validated against a holdout test set before deployment. The system handles part number changeovers automatically - the production line's MES system sends the active part number via MQTT, and the inspection system loads the corresponding model configuration.

Results

  • 89% reduction in defect escape rate (from 40% escape rate to 4.4%), measured over the first 6 months across all three production lines and 340+ part numbers
  • 99.5% detection accuracy (recall), with a false positive rate of 2.2% - down from 22% with the prior machine vision system
  • ROI realized within the first quarter - $340K/month in quality costs reduced to approximately $41K/month, yielding $900K+ in savings in the first 3 months
  • Warranty claims reduced by 72% within 6 months, leading both at-risk OEM customers to increase their order volumes

Key Insight

The lighting and camera setup contributed more to detection accuracy than the choice of neural network architecture - no model can detect a defect that isn't visible in the input image.

Both of our largest OEM customers had us on quality probation. Six months after deployment, one of them increased their order volume by 30%. The system catches things our best inspectors couldn't see - micro-scratches, subtle discoloration from tool wear. We went from dreading quality audits to using the dashboard data to drive process improvements upstream.

MK

Michael Kagan

CTO at NVIDIA

More Case Studies

Ready to build your AI advantage?

Stop researching. Start building. Book a free consultation and discover how custom AI can transform your business.