Automated Assembly Monitoring with Synthetic Data
Manual assembly tasks are prone to errors such as missing assembly steps, incorrect step sequences, or imprecise component placements. These errors can lead to product failures, reduced quality and potentially high costs. While computer vision models offer the potential for automating the recognition of critical assembly steps for quality control, their deployment is challenged by the high costs associated with real-world data collection and annotation.
Method
Recent advances in computer vision enable the generation of assembly motions and the use of synthetic data to train inspection models, improving the generalization and adaptability of automated assembly monitoring. Building on this, we propose a pipeline consisting of three key modules: Physics-based assembly motion generation, Photorealistic assembly rendering, and Inspection model training, as illustrated below.
Specifically, the pipeline takes as input the CAD files of the product and tools along with their material properties, as well as the assembly sequence with the components assigned to each step. The assembly motion generation module, adapted from [1], simulates diverse assembly motions while introducing random initial states, grasping poses, and moving velocities to reflect human variability. Next, the rendering module generates high-fidelity RGB assembly sequences with realistic component and tool materials while incorporating random lightening, backgrounds and occlusions to simulate real-world disturbances. Finally, the inspection model training module leverages the synthetic data to train computer vision models for automated quality inspection.
Results
Depending on the assembly task, the method achieves a very high success rate in detecting completed assembly steps. The method can be adapted to new cases with less than two hours of manual effort.
These results highlight the effectiveness of leveraging synthetic data for automated assembly monitoring. Furthermore, the approach holds promise for handling more complex scenarios through enhanced hand-object interaction understanding—including fully occluded actions (e.g., tightening a screw with a manual screwdriver) and tasks with minimal visual changes (e.g., applying grease with fingers).
Publications
[1] GraspXL: Generating Grasping Motions for Diverse Objects at Scale
For more information please contact Hui Zhang or Julian Ferchow.