Dream Engines
Open in Colab

Updated 2026-05-06: now uses client.predict_many (shipping in dream-engine 0.3.0).

Data augmentation

Bootstrap a policy-training dataset by rolling out a world model from real teleop start frames + perturbed action sequences. Bit-stable on Hopper means the augmented set is reproducible — the same (seed, start_frame, actions) always emits the same mp4.

When this is useful

  • Your real-teleop dataset is small (<100 episodes) and you need 10× more for downstream policy training.
  • You have hard-to-reach states in the original demos (rare grasps, failure recoveries) and want to synthesize variations.
  • You want to evaluate a candidate policy against reproducible trajectories before committing to a real-robot trial.

The pattern

client.predict_many handles the iteration loop, concurrency, and retry for you. Point it at a source of episodes and a sink for the output videos:

PYTHON
import os, numpy as np, dream
from pathlib import Path
from io import BytesIO
from PIL import Image
from dream.io import IterableSource, SourceRow, RolloutSink
class PerturbedEpisodeSource(IterableSource):
"""
Loads real teleop episodes from disk, emits K perturbed variants
of each as separate rows.
"""
def __init__(self, episode_paths: list[Path], k_perturbs: int = 4):
self._paths = episode_paths
self._k = k_perturbs
self.rows_hint = len(episode_paths) * k_perturbs
def __iter__(self):
for ep_path in self._paths:
ep = np.load(ep_path)
start = ep["start_frame"] # (480, 640, 3) uint8
base = ep["actions"] # (48, 384) float32
for i in range(self._k):
noise = (np.random.RandomState(i)
.randn(*base.shape)
.astype(np.float32) * 0.05)
actions = (base + noise).astype(np.float32)
# encode frame to PNG bytes
buf = BytesIO()
Image.fromarray(start).save(buf, format="PNG")
# encode actions to .npy bytes
act_buf = BytesIO()
np.save(act_buf, actions)
yield SourceRow(
row_id=f"{ep_path.stem}_aug{i:02d}",
frame_bytes=buf.getvalue(),
actions_bytes=act_buf.getvalue(),
metadata={"seed": i, "source_episode": ep_path.stem},
)
REAL_EPISODES = list(Path("/data/gr1_teleop").glob("ep_*.npz"))
client = dream.Client()
src = PerturbedEpisodeSource(REAL_EPISODES, k_perturbs=4)
sink = RolloutSink.dir("/data/augmented")
# Optional: check cost before running.
estimate = client.estimate_cost(src, spec="dreamdojo-2b-gr1")
print(f"≈ ${estimate.total_usd:.2f} for {estimate.rows} rows")
result = client.predict_many(
src,
sink,
spec="dreamdojo-2b-gr1",
concurrency=8, # 8 concurrent requests; Modal autoscales
on_error="skip",
progress=True,
)
print(f"{result.ok} ok / {result.failed} failed → {result.output_uri}")
# /data/augmented/videos/ep_001_aug00.mp4 … ep_100_aug03.mp4
# /data/augmented/metadata.parquet

For 100 real episodes × 4 perturbations at $0.0245 per rollout (the GR-1 per-rollout charge — 49 frames × $0.0005):

100 × 4 × $0.0245 = $9.80

See the bulk inference quickstart for a simpler walk-through using the public kingJulio/dream-engine-example-frames fixture.

Avoid predict_batch here

predict_batch shares one start frame across K rollouts. For augmentation you typically want K different start frames (one per real episode), each with a few perturbed action sequences. The predict_many pattern above handles this correctly — each PerturbedEpisodeSource row is an independent request.

If you do want K perturbations of the same starting frame (e.g. for ablation experiments or contrastive losses), predict_batch is cheaper.

Quality control

The augmented frames look reasonable in-distribution but degrade as your perturbation magnitude grows. Two checks:

  1. Visual sanity — eyeball every 50th rollout. If the GR-1 suddenly teleports or the mug deforms, you've gone OOD.
  2. PSNR drift — score augmented rollouts against their unperturbed parent. The engine ships at PSNR 22.35 dB on canonical inputs; augmented rollouts at perturb=0.05 typically come in within 1 dB. Past 2-3 dB drop, the synthetic data is noticeably hallucinatory and shouldn't be used for training.

Reproducibility receipt

Save the inputs alongside the outputs:

PYTHON
np.savez(f"/data/augmented/{stem}_aug{i:02d}.npz",
seed=i,
start_frame=start,
actions=actions,
request_id=rollout.request_id,
engine_wall_ms=rollout.engine_wall_ms,
cost_usd=rollout.cost_usd)

Six months later, you can re-run any single augmented episode with the same seed and confirm the engine still produces the same bytes. That matters for reproducibility audits in research papers.