Dream Engines
TL;DR

$0.0005 per generated frame. About $0.0245 per standard 49-frame rollout.

Prepaid credits — no tiers, no auto-renew, never expire.

Estimate
rollouts / day
$ / day
$2.45
$ / month
$73.50
4,900 frames/day·$0.0245 / rollout fixed·30-day month
Dream Engine

Dream Engine — pricing

$0.0005 per generated frame. Flat rate, every customer. ~$0.0245 per standard 49-frame rollout. Paid out of a prepaid credit balance topped up via Stripe Checkout — no tiers, no commitments, no warm-pool minimum.

This page is the single source of truth — when prices change, this file updates first, the website second, customer emails third. Internal margin policy is at the bottom.


How it works

  1. The customer mints (or receives) an API key. Each key has a prepaid credit balance, denominated in USD cents.
  2. The customer tops up the balance with a one-time Stripe Checkout payment — client.billing.topup(amount_usd=25.00) returns a hosted checkout URL.
  3. Each predict call pre-debits the predicted cost (frames × $0.0005, rounded up to the nearest cent) from the balance before the engine runs.
  4. Engine errors trigger an automatic refund. Insufficient balance returns HTTP 402 before any GPU work fires.
import dream
client = dream.Client()                                # reads DREAM_API_KEY
print(client.billing.balance().balance_usd)            # e.g. 47.75

# Run a rollout — debits 2 cents on the way through.
r = client.models.get("dreamdojo-2b-gr1").predict(
    start_frame=img, actions=acts,
)

# Top up when low.
session = client.billing.topup(amount_usd=25.00)
print("Open this in a browser to pay:", session.url)

What you get

  • Flat per-frame pricing — $0.0005, regardless of resolution, batch size, or volume.
  • Three-metric quality gate enforced (PSNR / SSIM / LPIPS — see docs/RESULTS.md).
  • Per-frame transparency on every response (X-DreamEngine-Frames, X-DreamEngine-Estimated-Charge-USD headers, surfaced as rollout.cost_usd).
  • Status page (/v1/status) with rolling 24h P50 / P99 latency.
  • Automatic refunds on engine error — you only pay for frames the engine successfully delivered.
  • All currently shipped optimisations enabled by default (Fused QKV, LUT conditioning, TeaCache, T5 cache, guidance=0 short-circuit).
  • A typed Python SDK that surfaces 402 as dream.InsufficientCreditsError carrying both the current balance and the requested amount.

Sub-cent precision

The credits ledger stores balances in mils (1 mil = $0.0001 = 1/100 of a cent), so per-frame charges are exact, not rounded. At $0.0005/frame:

  • 1 frame = exactly 5 mils ($0.0005)
  • 49 frames (canonical DreamDojo rollout) = exactly 245 mils ($0.0245)
  • $5.00 = exactly 50,000 mils

This matters at scale: pre-0.2.1 the engine rounded the 49-frame rollout to 2¢ ($0.02), under-charging by 18% per call. Post-0.2.1 every rollout is billed exactly. Across 1M rollouts/year that's ~$4,500 of revenue we used to lose to rounding.

Customers see balance_mils (exact) plus balance_cents and balance_usd (derived display values) on every ledger response.

Top-up bounds

LimitUSD
Minimum top-up$5
Maximum top-up$10,000

The SDK validates these client-side before any HTTP call. The engine enforces the same bounds server-side. Need a larger top-up? Email hello@dreamengines.run.

Rate limits

A per-key token bucket catches abuse on top of the credits ledger. Default knobs on a fresh key:

SettingDefault
qps refill2.0
burst10

When the bucket empties the engine returns 429 with Retry-After; the SDK retries automatically. Need higher limits for a planning loop? Email and we'll dial up your key.

What you don't get yet (post-v1)

  • Streaming output (/v1/predict/stream SSE) — gated on the v0.6 streaming runner.
  • Continuous-batching scheduler (vLLM-style) — concurrent requests serialise on the GPU today.
  • Multi-region failover — single Modal region for v1.
  • Self-serve signup — onboarding is hand-shake via scripts/create_api_key.py for the first cohort.
  • WebRTC realtime / fal-WMA bridge.

Reference: how a customer's bill computes

debit_cents_for_predict
  = ceil(num_frames × $0.0005 × 100 cents)
  = ceil(num_frames × 0.05)

A planner doing visual MPC at K=8 fused candidates, 10 decisions/sec for 1 hour:

1 hour × 10 decisions/sec × 8 candidates × 49 frames = 14,112,000 frames
14,112,000 × $0.0005 = $7,056

By comparison, self-hosting that same workload directly on Modal H100s requires 14,112,000 / 49 = 287,999 rollouts × ~3 s engine_wall = ~240 hours of GPU time = ~$948 of raw Modal compute — but spread across weeks of engineering effort to set up, optimise, monitor, and harden.

Markup: ~7.4× over self-hosted compute. That's what the customer pays for the bundle of optimised inference + zero ops + per-frame metering + customer support. If a customer needs significant volume on committed terms, email and we'll figure it out per-customer rather than via a self-serve discount tier.


Internal margin policy (not customer-facing)

Cost basis (Modal H100 SXM @ $3.95/hr)

ComponentTimeCost
Engine wall (warm)2.62 s$0.0029
HTTP transit + mp4 encode~1.4 s$0.0015
Marginal cost / rollout (warm)~4 s$0.0044
Marginal cost / frame$0.00009
Cold-start amortisation (~70 s / 100 rollouts/session)~0.7 s/rollout+$0.00077
Allocated cost / frame~$0.0001

Margin

ItemValue
Selling price / frame$0.0005
Allocated cost / frame~$0.0001
Gross margin80%

The flat-pricing model trades the slim per-frame discount we used to give Scale-tier customers ($0.0004) for simplicity: one number, one billing flow, no tier-eligibility footnotes. At expected v1 volume (<100M frames/mo across the cohort) the lost margin is in the noise; at high volume we negotiate per-customer (see "Re-price triggers" below).

Margin floor

If allocated cost ever exceeds 30% of selling price (margin drops below 70%), we either:

  1. Re-optimise the engine (next levers: VAE compile, Conv FP8, self-forcing checkpoint training).
  2. Renegotiate Modal pricing or move workloads to cheaper providers (nebius, runpod) when commercially viable.
  3. Raise prices, with 30 days' notice to customers.

Current state: ~18% cost ratio, 82% gross margin. Healthy.

Cold-start economics

We ship with min_containers=0 and a tightened scaledown_window=60s (was 600s). First request after idle pays the ~70 s cosmos load tax. Switching to an always-warm container costs $3.95/hr × 24 × 30 ≈ $2,840 / month / container of pure GPU rent. We'll flip when:

  • A paying customer complains about cold-start latency in writing, OR
  • The cohort's blended traffic crosses the break-even point ($2,840 / $0.0005 = 5.68M frames/month attributable to the gap), OR
  • One large customer commits to >2M frames/month and asks for warm-pool as part of the deal.

The 60s scaledown is a conscious bet that bursty traffic is the common case for early customers (a single eval batch, then quiet); idle dollars dominated at 600s.

Re-price triggers

  • A new Modal GPU class (B200, H200, …) shifts the cost basis — re-derive marginal cost, decide whether to pass savings on or pocket margin.
  • A new optimisation lands and drops cost by ≥ 20% — 50/50 split between price reduction (customer-facing) and margin (re-investment in engineering).
  • Competitive pressure from fal / Together / Anyscale serving cosmos directly.
  • A customer asks for a committed-use discount (>1M frames/month for 12 months) — handle per-customer; flat list price stays.

History

DateChangeReason
2026-05-05Tiers dropped — flat $0.0005/frame, prepaid credits.Cleaner billing story for early users. Frees us to advertise "$0.0245/rollout, no commitments" without footnotes about tier eligibility. Scale-tier discount ($0.0004) absorbed back into list price.
2026-05-04v1 pricing published — Pro $0.0005/frame, free 1K/moTrack C launch (now superseded).