Dream Engines
Open in Colab

Visual MPC

Sample K candidate action sequences, run them all in one server-side forward pass, score the rollouts, pick the best. The predict_batch method makes this one HTTP call.

The pattern

PYTHON
import numpy as np
import dream
client = dream.Client()
model = client.models.get("dreamdojo-2b-gr1")
def score(rollout: dream.Rollout) -> float:
"""Your task-specific reward. Could be:
- distance to a goal frame
- a learned reward model
- a pixel-space cost like 'is the cup upright'
"""
return reward_fn(rollout.frames) # rollout.frames is (48, 480, 640, 3) uint8
# ── 1. Sample K candidate action sequences ────────────────────────────
K = 8
candidates = sample_candidates(K, T=48, action_dim=384)
# shape: (8, 48, 384) float32 — typically perturbations of a base plan
# ── 2. Run them all in one server roundtrip ───────────────────────────
batch = model.predict_batch(start_frame=current_frame, actions=candidates)
# ── 3. Score and pick ─────────────────────────────────────────────────
scores = [score(r) for r in batch]
best_idx = max(range(K), key=scores.__getitem__)
best_actions = candidates[best_idx]
print(f"K={K}, total cost ${batch.cost_usd}, wall {batch.wall_s:.2f}s")
batch[best_idx].save("best_rollout.mp4")

Why batch over gather

Three reasons predict_batch beats firing K independent model.predict calls in parallel:

  1. Fused server-side forward. The K candidates share the same start-frame encoding and the diffusion model batches them — the GPU runs ~the same wall as K=1. K=8 is roughly 25% slower than K=1, not 8×.
  2. One transit. One TLS handshake, one redirect-follow, one response. K independent gathers add 8× of overhead.
  3. Cost the same. $0.0005 / frame regardless of batched-or-not. K=8 batch and K=8 gather both cost K × T × $0.0005.

For DreamDojo on H100, K=8 takes ~3.2 s end-to-end on a warm container. The same K=8 via asyncio.gather would take ~16 s.

Sampling K candidates

The right way to generate (K, T, action_dim) depends on your problem. Common patterns:

PYTHON
# Random search around a reference plan
ref = np.load("base_plan.npy") # (48, 384)
noise = np.random.randn(K, 48, 384) * 0.1
candidates = (ref[None, :, :] + noise).astype(np.float32)
# Cross-entropy method — sample from a learned proposal distribution
candidates = cem_sample(prior_dist, K=8, T=48)
# Action-space lattice — sweep over a few discrete strategies
candidates = np.stack([base, base + d, base - d, base * 1.1])

Scoring options

For most physics-grounded tasks, score on the predicted frames:

  • Goal-distance — compute optical-flow / feature distance between the rollout's last frame and a target image.
  • Learned reward model — pass rollout.frames through a vision reward network trained on human ratings.
  • Latent prediction error — encode each frame with a VAE / DINO, measure trajectory smoothness in latent space.

Avoid scoring on raw pixel-MSE against a target — it's a notoriously poor proxy for task success.

Real-time MPC loop

PYTHON
import asyncio, dream
async def control_loop():
async with dream.AsyncClient() as client:
model = await client.models.get("dreamdojo-2b-gr1")
current_frame = capture_camera_frame()
while not done():
candidates = sample_candidates(K=8, T=48, action_dim=384)
batch = await model.predict_batch(
start_frame=current_frame, actions=candidates,
)
scores = [score(r) for r in batch]
best = candidates[max(range(8), key=scores.__getitem__)]
execute_first_action(best[0])
current_frame = capture_camera_frame()

Per-step wall: ~3.5 s on a warm container, dominated by the engine forward. Real-time-loop budget depends on your task; for slow manipulation (cup-pouring, button-pressing) this is workable.

Cost discipline

Roughly K × per-rollout cost, where per-rollout on GR-1 is $0.0245 (49 frames billed × $0.0005). The actual batch.cost_usd comes in slightly lower because the server amortizes the shared start frame across K.

KCost / batch (≈)Cost / 1K batches (≈)
4$0.098$98
8$0.196$196
16$0.392$392

If you're running a 1-Hz MPC loop for an hour, you'll hit ~3,600 batches. At K=8 that's ~$700. Tune K against your reward variance — often K=4 with smarter sampling beats K=16 with random search.