Back to Projects
completed

Humanoid VLA: Vision-Language-Action Controlled Robot

Unitree G1 humanoid manipulation system driven by natural-language commands, ACT policies, and a full ROS 2 inference loop.

MuJoCoROS 2ACTImitation LearningPyTorchDomain Randomization
Specifications
  • 4 single-arm tasks at 86% success, bimanual grasping at 100%
  • Domain randomization: 90% in-distribution to 55% OOD with graceful degradation
  • Natural language parsing into ACT inference at 30Hz through ROS 2
  • 15.6M parameter ACT model trained in about 2.5 hours on an RTX 4050

Overview

This system joins language conditioning, robot perception, and action chunking into a single control loop for the Unitree G1 humanoid. The work focused on making a research stack stable enough for repeated evaluation instead of optimizing only for a polished demo.

Notes

The project paired scripted demonstrations with ACT training, then layered in domain randomization to push policy robustness beyond the nominal simulation setup.

Project: humanoid-vla