Frames, chunks, fps

Every spec in the catalog publishes its own wire shape: resolution, chunk size, frame rate, default rollout length. The SDK reads these live from model.spec.arch so your code doesn't have to hard-code anything spec-specific.

This page walks through the values — using the current active spec (DreamDojo · GR-1) as the concrete reference — and the engine's chunk-alignment rule that applies to every spec.

480×640 resolution

The spec config encodes resolution as [H, W] — height first, matching the engine's internal convention. The catalog JSON, the SDK (model.resolution), and the engine all agree on (480, 640). Don't swap to (640, 480); start_frame arrays of the wrong shape get rejected at the boundary.

48 frames

Internally, the runner generates frames in chunks of 12. The canonical rollout for GR-1 is 4 chunks = 48 frames.

You'll occasionally see "49-frame rollouts" in older marketing copy (including the catalog's bench_wall_s line). That's wrong: the regression suite — which is what's actually been benchmarked — runs 48-frame rollouts. Your synthetic example should use 48 too; dream.examples.dreamdojo_grasp() does.

Why not 49?

49 frames = 4 full chunks (48) + 1 trailing frame. Pre-fix, the trailing partial chunk crashed the WAN2.1 tokenizer's time_conv because the cached + new time-dim collapsed to 2 (kernel size is 3). The engine now pads trailing partial chunks transparently, but the SDK still validates chunk-alignment client-side and raises dream.InputValidationError to keep the wire honest and catch mistakes before paying for the round-trip.

Bottom line: always pass T as a multiple of model.chunk_size. The SDK won't let you do otherwise.

10 fps

The mp4 output is encoded at 10 fps, which means a 48-frame rollout plays back as a 4.8-second video. That's the cadence the underlying GR00T teleop dataset was recorded at, and the model preserves it.

If you decode rollout.frames (numpy ndarray, shape (48, 480, 640, 3) uint8), there's no fps metadata — those are just frames. The fps matters only when re-encoding for playback.

Other specs

The catalog endpoint exposes per-spec values:

PYTHON

model = client.models.get("dreamdojo-2b-gr1")
print(model.action_dim)       # 384
print(model.resolution)       # (480, 640)
print(model.chunk_size)       # 12
print(model.spec.arch.default_num_steps)   # 35

When new specs ship (e.g. dreamdojo-2b-yam, dreamdojo-14b-gr1), they may change resolution, chunk_size, or action_dim. The SDK hydrates whatever the catalog returns; your code reads the live values rather than hard-coding assumptions.