Updated 2026-05-06: now uses client.predict_many (shipping in dream-engine 0.3.0).

Data augmentation

Bootstrap a policy-training dataset by rolling out a world model from real teleop start frames + perturbed action sequences. Bit-stable on Hopper means the augmented set is reproducible — the same (seed, start_frame, actions) always emits the same mp4.

When this is useful

Your real-teleop dataset is small (<100 episodes) and you need 10× more for downstream policy training.
You have hard-to-reach states in the original demos (rare grasps, failure recoveries) and want to synthesize variations.
You want to evaluate a candidate policy against reproducible trajectories before committing to a real-robot trial.

The pattern

client.predict_many handles the iteration loop, concurrency, and retry for you. Point it at a source of episodes and a sink for the output videos:

PYTHON

import os, numpy as np, dream
from pathlib import Path
from io import BytesIO
from PIL import Image
from dream.io import IterableSource, SourceRow, RolloutSink
 
class PerturbedEpisodeSource(IterableSource):
    """
    Loads real teleop episodes from disk, emits K perturbed variants
    of each as separate rows.
    """
 
    def __init__(self, episode_paths: list[Path], k_perturbs: int = 4):
        self._paths = episode_paths
        self._k = k_perturbs
        self.rows_hint = len(episode_paths) * k_perturbs
 
    def __iter__(self):
        for ep_path in self._paths:
            ep    = np.load(ep_path)
            start = ep["start_frame"]  # (480, 640, 3) uint8
            base  = ep["actions"]      # (48, 384) float32
 
            for i in range(self._k):
                noise   = (np.random.RandomState(i)
                           .randn(*base.shape)
                           .astype(np.float32) * 0.05)
                actions = (base + noise).astype(np.float32)
 
                # encode frame to PNG bytes
                buf = BytesIO()
                Image.fromarray(start).save(buf, format="PNG")
 
                # encode actions to .npy bytes
                act_buf = BytesIO()
                np.save(act_buf, actions)
 
                yield SourceRow(
                    row_id=f"{ep_path.stem}_aug{i:02d}",
                    frame_bytes=buf.getvalue(),
                    actions_bytes=act_buf.getvalue(),
                    metadata={"seed": i, "source_episode": ep_path.stem},
                )
 
 
REAL_EPISODES = list(Path("/data/gr1_teleop").glob("ep_*.npz"))
 
client = dream.Client()
src    = PerturbedEpisodeSource(REAL_EPISODES, k_perturbs=4)
sink   = RolloutSink.dir("/data/augmented")
 
# Optional: check cost before running.
estimate = client.estimate_cost(src, spec="dreamdojo-2b-gr1")
print(f"≈ ${estimate.total_usd:.2f} for {estimate.rows} rows")
 
result = client.predict_many(
    src,
    sink,
    spec="dreamdojo-2b-gr1",
    concurrency=8,   # 8 concurrent requests; Modal autoscales
    on_error="skip",
    progress=True,
)
print(f"{result.ok} ok / {result.failed} failed → {result.output_uri}")
# /data/augmented/videos/ep_001_aug00.mp4 … ep_100_aug03.mp4
# /data/augmented/metadata.parquet

For 100 real episodes × 4 perturbations at $0.0245 per rollout (the GR-1 per-rollout charge — 49 frames × $0.0005):

100 × 4 × $0.0245 = $9.80

See the bulk inference quickstart for a simpler walk-through using the public kingJulio/dream-engine-example-frames fixture.

Avoid `predict_batch` here

predict_batch shares one start frame across K rollouts. For augmentation you typically want K different start frames (one per real episode), each with a few perturbed action sequences. The predict_many pattern above handles this correctly — each PerturbedEpisodeSource row is an independent request.

If you do want K perturbations of the same starting frame (e.g. for ablation experiments or contrastive losses), predict_batch is cheaper.

Quality control

The augmented frames look reasonable in-distribution but degrade as your perturbation magnitude grows. Two checks:

Visual sanity — eyeball every 50th rollout. If the GR-1 suddenly teleports or the mug deforms, you've gone OOD.
PSNR drift — score augmented rollouts against their unperturbed parent. The engine ships at PSNR 22.35 dB on canonical inputs; augmented rollouts at perturb=0.05 typically come in within 1 dB. Past 2-3 dB drop, the synthetic data is noticeably hallucinatory and shouldn't be used for training.

Reproducibility receipt

Save the inputs alongside the outputs: