Visual MPC

Sample K candidate action sequences, run them all in one server-side forward pass, score the rollouts, pick the best. The predict_batch method makes this one HTTP call.

The pattern

PYTHON

import numpy as np
import dream
 
client = dream.Client()
model  = client.models.get("dreamdojo-2b-gr1")
 
def score(rollout: dream.Rollout) -> float:
    """Your task-specific reward. Could be:
       - distance to a goal frame
       - a learned reward model
       - a pixel-space cost like 'is the cup upright'
    """
    return reward_fn(rollout.frames)   # rollout.frames is (48, 480, 640, 3) uint8
 
# ── 1. Sample K candidate action sequences ────────────────────────────
K = 8
candidates = sample_candidates(K, T=48, action_dim=384)
# shape: (8, 48, 384) float32 — typically perturbations of a base plan
 
# ── 2. Run them all in one server roundtrip ───────────────────────────
batch = model.predict_batch(start_frame=current_frame, actions=candidates)
 
# ── 3. Score and pick ─────────────────────────────────────────────────
scores = [score(r) for r in batch]
best_idx = max(range(K), key=scores.__getitem__)
best_actions = candidates[best_idx]
 
print(f"K={K}, total cost ${batch.cost_usd}, wall {batch.wall_s:.2f}s")
batch[best_idx].save("best_rollout.mp4")

Why batch over `gather`

Three reasons predict_batch beats firing K independent model.predict calls in parallel:

Fused server-side forward. The K candidates share the same start-frame encoding and the diffusion model batches them — the GPU runs ~the same wall as K=1. K=8 is roughly 25% slower than K=1, not 8×.
One transit. One TLS handshake, one redirect-follow, one response. K independent gathers add 8× of overhead.
Cost the same. $0.0005 / frame regardless of batched-or-not. K=8 batch and K=8 gather both cost K × T × $0.0005.

For DreamDojo on H100, K=8 takes ~3.2 s end-to-end on a warm container. The same K=8 via asyncio.gather would take ~16 s.

Sampling K candidates

The right way to generate (K, T, action_dim) depends on your problem. Common patterns:

PYTHON

# Random search around a reference plan
ref = np.load("base_plan.npy")          # (48, 384)
noise = np.random.randn(K, 48, 384) * 0.1
candidates = (ref[None, :, :] + noise).astype(np.float32)
 
# Cross-entropy method — sample from a learned proposal distribution
candidates = cem_sample(prior_dist, K=8, T=48)
 
# Action-space lattice — sweep over a few discrete strategies
candidates = np.stack([base, base + d, base - d, base * 1.1])

Scoring options

For most physics-grounded tasks, score on the predicted frames:

Goal-distance — compute optical-flow / feature distance between the rollout's last frame and a target image.
Learned reward model — pass rollout.frames through a vision reward network trained on human ratings.
Latent prediction error — encode each frame with a VAE / DINO, measure trajectory smoothness in latent space.

Avoid scoring on raw pixel-MSE against a target — it's a notoriously poor proxy for task success.

Real-time MPC loop

PYTHON

import asyncio, dream
 
async def control_loop():
    async with dream.AsyncClient() as client:
        model = await client.models.get("dreamdojo-2b-gr1")
        current_frame = capture_camera_frame()
 
        while not done():
            candidates = sample_candidates(K=8, T=48, action_dim=384)
            batch = await model.predict_batch(
                start_frame=current_frame, actions=candidates,
            )
            scores = [score(r) for r in batch]
            best = candidates[max(range(8), key=scores.__getitem__)]
 
            execute_first_action(best[0])
            current_frame = capture_camera_frame()

Per-step wall: ~3.5 s on a warm container, dominated by the engine forward. Real-time-loop budget depends on your task; for slow manipulation (cup-pouring, button-pressing) this is workable.

Cost discipline

Roughly K × per-rollout cost, where per-rollout on GR-1 is $0.0245 (49 frames billed × $0.0005). The actual batch.cost_usd comes in slightly lower because the server amortizes the shared start frame across K.

K	Cost / batch (≈)	Cost / 1K batches (≈)
4	$0.098	$98
8	$0.196	$196
16	$0.392	$392

If you're running a 1-Hz MPC loop for an hour, you'll hit ~3,600 batches. At K=8 that's ~$700. Tune K against your reward variance — often K=4 with smarter sampling beats K=16 with random search.

Visual MPC

The pattern

Why batch over gather

Sampling K candidates

Scoring options

Real-time MPC loop

Cost discipline

Why batch over `gather`