Deploying VLAs on G1: Architecture & Integration

Duration: 65 min · Level: Advanced · Module: 5. Foundation Models & VLA Architecture · Focus: deployment, inference, hardware, architecture

A 3-to-7-billion-parameter VLA is a marvel in a data center and a problem on a robot. The G1 humanoid runs on a battery and carries its own compute, yet you want it to think with a foundation model and keep its balance at kilohertz rates. Those two demands cannot be met by one model on one loop. This lesson is about the engineering that reconciles them: how to allocate compute, split the control architecture into the right frequencies, shrink the model to fit, and — non-negotiably for healthcare — wrap the whole thing in a safety filter. By the end you will size a deployment that actually closes the loop.

The compute budget you're working against

The reference onboard accelerator is the NVIDIA AGX Orin: 275 TOPS of INT8 compute and 64 GB of LPDDR5 memory. That is enough to run a 7B-parameter quantized model at about 5 Hz. Read that number carefully. Five hertz is sufficient for high-level task planning — deciding what to do, conditioned on language and vision — but it is nowhere near enough for high-frequency control. A robot cannot stay upright or react to contact on a 5 Hz loop. The hardware does not let you run the big model fast; it lets you run it slowly, which turns out to be exactly the right speed for one specific job.

The two-level control architecture

The resolution to the frequency problem is to stop pretending one model does everything. Split control into two levels:

The VLA runs at 5-10 Hz as the high-level, language-conditioned task planner. It looks at the scene, reads the instruction, and decides the next sub-goal or action chunk.
A low-level reactive controller runs at 1-2 kHz for joint execution — balance, contact reaction, trajectory tracking. This is the loop that keeps the robot standing and safe.

This maps cleanly onto the dual-system idea now common across the field (NVIDIA's GR00T N1, for example, frames it as a slow System 2 feeding a fast System 1): a slow, smart planner setting goals for a fast, reflexive controller that executes them.

The compute budget you're working against​

The two-level control architecture​

The compute budget you're working against

The two-level control architecture