Skip to main content

One doc tagged with "TensorRT"

Model Compression for Edge Deployment

A 7B parameter VLA model in FP32 requires 28GB of memory and inference at <1 Hz on an AGX Orin — unusable