Cloudflare Containers — 2nth.ai Skills

01 — OVERVIEW

WHAT ARE CLOUDFLARE CONTAINERS

Cloudflare Containers lets you run standard Docker containers as part of your Workers application. The Worker handles all the incoming requests, authentication, and routing — then hands off to a container when the work needs a full Linux environment, more memory, heavier CPU, or a specific runtime that V8 isolates can't provide.

The key difference from every other container platform: you don't deploy a container to a fixed server or cluster. Containers are spawned on demand, globally, and managed by a Durable Object that acts as the container's programmable lifecycle controller. When traffic stops, the container sleeps. When a request arrives, it wakes up within seconds.

Currently in beta — requires the Workers Paid plan ($5/month). The container itself bills by active usage (CPU-seconds, GiB-seconds, GB-seconds for disk).

Beta Status

Containers is in open beta as of early 2026. The API and configuration format may change before GA. Check developers.cloudflare.com/containers for the latest.

02 — DECISION GUIDE

CONTAINERS VS WORKERS

Workers (V8 isolates) remain the default. Use Containers when the work genuinely requires what Workers can't provide. The two are designed to compose — not compete.

Use Workers when

Request/response latency is critical — sub-millisecond cold start
You can write in JavaScript, TypeScript, or WebAssembly
The task completes in a single event loop turn
You need global distribution across 300+ PoPs
Storage via KV, D1, R2, Durable Objects is sufficient
CPU execution time under 30 seconds per request

Use Containers when

You need Python, Go, Rust, Java, or any language beyond JS/Wasm
The task is CPU-intensive (ML inference, image processing, video encoding)
You need more than 128 MB RAM — up to 16 GiB available
A full filesystem or specific Linux tools are required
You're running an existing Docker image without rewriting it
Long-running stateful sessions with session-affinity routing

The practical pattern: Worker handles auth, rate limiting, request parsing, and routing. Container handles the heavy computation. Worker calls container via fetch() on its exposed port — same as calling any internal service.

03 — ARCHITECTURE

HOW IT WORKS

Three layers compose to give you programmable container lifecycle management on the Cloudflare edge.

Request → Worker → Durable Object → Container

ClientBrowser / API

→

CF WorkerAuth, routing, logic

→

Durable ObjectLifecycle controller

→

ContainerYour Docker image

Worker

Ingress layer

Receives all external requests. Handles auth, parses the request, decides which container instance to route to, then proxies via fetch(). The Worker has no state — it's stateless by design.

Durable Object

Lifecycle controller

One Durable Object per container instance. It starts the container, holds the container handle, manages sleep/wake, passes environment variables, and executes hooks on status changes. class_name in wrangler.toml must match your DO class.

Container

Your workload

Standard Docker container. Must expose a port (typically 8080) and run as linux/amd64. Receives HTTP requests from the Durable Object. Full Linux environment — any language, any tool, any filesystem layout.

Location selection

When a container is first spawned, Cloudflare selects the nearest edge location that has your container image pre-fetched. All subsequent requests to that Durable Object route to the same physical location (session affinity). If the container stops and restarts, it may land in a different location based on availability and image presence.

Session affinity

Because Durable Objects are globally unique and have a fixed home location, every request that goes to a given Durable Object will go to the same container instance. This makes stateful patterns — browser sessions, long-running jobs, game servers, language runtimes — straightforward to implement.

04 — LIFECYCLE

CONTAINER LIFECYCLE

Containers have five distinct states. Understanding them determines your cold start strategy and cost model.

Stopped

Not running, not billing. Counts zero toward max_instances. Starts on next request.

Cold start

2–3 seconds typical. Image already fetched to edge location. First request waits.

Running

Actively serving requests. Billing by 10ms increments. Session-affinity active.

Sleeping

Idle for sleepAfter duration. Memory preserved. Billing paused. Sub-second wake.

Shutdown

SIGTERM sent. 15-minute grace window before SIGKILL. All disk ephemeral — lost on shutdown.

Sleep vs Stop

Configure sleepAfter to keep the container in a cheap sleeping state between requests. Without it, the container shuts down when traffic stops — next request incurs a full cold start. For interactive sessions (coding assistants, game servers), sleep is almost always the right default.

Graceful shutdown

When a container is stopped — manually or due to a deployment rollout — it receives SIGTERM. Your process should catch this and complete in-flight work within the 15-minute window. After 15 minutes, SIGKILL terminates the process immediately. All disk content is lost — containers have no persistent storage between restarts.

# Python: catch SIGTERM for graceful shutdown
import signal, sys

def handle_sigterm(sig, frame):
    print("SIGTERM received, flushing work...")
    # flush queues, write final state to R2 or D1
    sys.exit(0)

signal.signal(signal.SIGTERM, handle_sigterm)

05 — RESOURCES

INSTANCE TYPES

Six predefined instance sizes cover most workloads. For unusual ratios, custom sizing is available with any combination within the platform maximums.

Type	vCPU	Memory	Disk	Best for
lite	1/16 (0.0625)	256 MiB	2 GB	Lightweight daemons, small scripts, proxies
basic	0.25	512 MiB	2 GB	REST APIs, small Python/Go services
standard-1	1	2 GiB	5 GB	Web apps, LLM tokenisation, data processing
standard-2	2	4 GiB	10 GB	ML inference (small models), parallel jobs
standard-3	3	8 GiB	15 GB	Image processing, medium model inference
standard-4	4	12 GiB	20 GB	Heavy compute, large model inference, ffmpeg
custom	0.0625–4.0	up to 16 GiB	up to 20 GB	Any ratio — min 3 GiB per vCPU

No GPU — yet

GPU instances are not available in the current beta. For heavy inference, use Workers AI (which runs on Cloudflare's GPU fleet) and call it from your Worker or container. Use containers for CPU-bound preprocessing, postprocessing, or custom Python toolchains around Workers AI calls.

06 — CONFIGURATION

WRANGLER.TOML CONFIGURATION

Containers are declared in wrangler.toml alongside your Durable Object bindings. The class_name field is the bridge — it must match the Durable Object class that manages the container lifecycle.

name = "my-worker"
main = "src/index.ts"
compatibility_date = "2026-01-01"

# ── Container declaration ─────────────────────────────────
[[containers]]
class_name = "MyContainer"      # Must match a Durable Object class
image = "./Dockerfile"          # Local Dockerfile, or registry URL
instance_type = "basic"         # lite | basic | standard-1..4 | custom
max_instances = 10              # Default: 20 (stopped don't count)

# ── Durable Object binding ─────────────────────────────────
[[durable_objects.bindings]]
name = "MY_CONTAINER"           # How you reference it in Worker code
class_name = "MyContainer"      # Same class_name as above

[[migrations]]
tag = "v1"
new_sqlite_classes = ["MyContainer"]

# ── For custom instance sizing ─────────────────────────────
# [[containers]]
# class_name = "HeavyWorker"
# image = "./Dockerfile.heavy"
# instance_type = "custom"
# [containers.resources]
# vcpu = 2.0
# memory_mib = 8192
# disk_mb = 15000

Referencing from Worker code

// src/index.ts
export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    // Route to a specific container instance by ID
    const id = env.MY_CONTAINER.idFromName("session-abc123");
    const stub = env.MY_CONTAINER.get(id);

    // Forward the request to the container
    return stub.fetch(req);
  }
};

// Container Durable Object class
export class MyContainer extends Container {
  defaultPort = 8080;
  sleepAfter = "5m";             // Sleep after 5 minutes of inactivity

  override async containerStarted(): Promise<void> {
    console.log("Container is up and serving");
  }
}

07 — DEPLOYMENT

DEPLOYING CONTAINERS

Deployment is a single command — wrangler deploy — but it does more work under the hood than a standard Worker deploy.

# Prerequisites: Docker must be running
docker info   # Verify Docker is up

# Bootstrap from template
npm create cloudflare@latest -- --template=cloudflare/templates/containers-template

# Deploy (builds image, pushes to CF registry, deploys Worker)
npx wrangler deploy

# Monitor running containers
npx wrangler containers list

# View deployed images in registry
npx wrangler containers images list

What happens during `wrangler deploy`

Wrangler calls Docker locally to build the image from your Dockerfile
Built image is pushed to Cloudflare's Container Registry (backed by R2)
Only changed layers are pushed on subsequent deploys — fast incremental updates
Worker code is deployed and linked to the container class
Cloudflare pre-fetches the image to edge locations based on your traffic patterns
First deployment takes several minutes before containers are ready to serve

Rolling deployments

New container versions deploy via a rolling strategy. Running instances receive SIGTERM and drain gracefully while new instances come up. There is no downtime if your container handles SIGTERM correctly.

Bringing an external image

You can use any public Docker Hub image or a private image from Amazon ECR or Docker Hub. Configure registry credentials first, then push the image manually to Cloudflare's registry:

# Configure private registry credentials (stored as secrets)
npx wrangler containers registries configure

# Push an external image
npx wrangler containers push docker.io/myorg/myimage:latest

# Reference in wrangler.toml
# image = "registry.cloudflare.com/myaccount/myimage:latest"

08 — REGISTRY

IMAGE REGISTRY

Cloudflare maintains a private container registry at registry.cloudflare.com, backed by R2. Every wrangler deploy with a local Dockerfile pushes to this registry automatically. You never need to manage registry credentials for your own images.

Registry	Private repos	Auth	Egress
registry.cloudflare.com	Yes (your account)	Automatic via wrangler	Free (R2 backed)
Docker Hub (public)	No	None needed	Rate limits apply
Docker Hub (private)	Yes	`wrangler containers registries configure`	May incur egress
Amazon ECR	Yes	`wrangler containers registries configure`	AWS egress charges apply

Image deletion is permanent

Deleting an image from the Cloudflare registry is irreversible. If you delete an image that a previous Worker version references, rolling back that Worker version will fail. Tag images carefully and retain images for any Worker version you may need to roll back to.

Total image storage per account is capped at 50 GB during beta. Images must use the linux/amd64 architecture — ARM images are not supported.

09 — NETWORKING

NETWORKING

Containers communicate with the outside world exclusively through their Durable Object. The Durable Object holds a handle to the container and proxies HTTP requests to whatever port the container exposes.

Port configuration

Set defaultPort in your Durable Object class — this is the port your container process must listen on. Standard convention is 8080. Requests arrive as plain HTTP fetch() calls, not raw TCP.

export class MyContainer extends Container {
  defaultPort = 8080;   // Must match EXPOSE in Dockerfile

  // Override to use a different port per-request
  async fetch(req: Request): Promise<Response> {
    return this.containerFetch(req, { port: 8080 });
  }
}

Environment variables

Pass environment variables to the container process via the Durable Object's envVars option. Cloudflare also injects a CLOUDFLARE_DEPLOYMENT_ID variable automatically. Never hardcode secrets — pass them from wrangler secret put through the Worker binding:

export class MyContainer extends Container {
  defaultPort = 8080;

  override getOptions(): ContainerOptions {
    return {
      envVars: {
        OPENAI_API_KEY: this.env.OPENAI_API_KEY,   // From Worker secret
        APP_ENV: "production",
      }
    };
  }
}

Container-to-container communication

Containers cannot directly call each other. All inter-container communication goes through Workers or Durable Objects. The Worker acts as the broker — it calls Durable Object A, which calls its container, then calls Durable Object B if needed. This keeps the routing model clean and auditable.

10 — USE CASES

USE CASES

The right mental model: Workers do everything they can. Containers handle the rest. Real-world patterns where containers fit naturally:

Code Sandboxes

Isolated execution environments

Run user-submitted Python, Node, or R code in an isolated container per session. The Durable Object provides session affinity — the user always lands on their container. Auto-sleep between interactions keeps costs minimal.

Headless Browsers

Playwright / Puppeteer at the edge

Run Chromium headless for web scraping, PDF generation, or screenshot capture. A Worker receives the request and routes to a sleeping browser container, which wakes and completes the task. Result can be stored in R2.

Language Runtimes

Python, Ruby, Java, Go, Rust

Deploy existing backend services without rewriting them as Workers. Worker handles CORS, auth, and rate limiting. Container handles the actual business logic in whatever language it was originally written in.

Heavy ML Pipelines

Pre/post-processing for AI

Workers AI handles GPU inference. Containers handle the CPU-intensive steps: chunking documents, running OCR via Tesseract, embedding preprocessing, re-ranking with FAISS, or complex prompt assembly logic in Python.

Media Processing

ffmpeg, ImageMagick, audio

Transcode video, resize and optimise images, strip metadata, or process audio. Worker receives upload, routes to a standard-4 container with ffmpeg installed, result is written to R2. No custom server infrastructure needed.

Stateful Game Servers

Session-affinity, long-running

Durable Objects give containers a persistent identity. Multiple players connect to the same DO, which routes to the same container instance. Auto-sleep handles idle rooms. No fixed servers to maintain.

11 — 2NTH PATTERN

HOW 2NTH.AI USES CONTAINERS

At 2nth.ai, Workers handle the vast majority of workloads — API routing, auth, session management, token tracking, KV reads. Containers are reserved for tasks that specifically need what Workers can't provide.

Python AI toolchains

Some agent pipelines use Python libraries (LangChain, sentence-transformers, custom parsers) that can't run in a V8 isolate. These run as basic or standard-1 containers with a sleepAfter of 10 minutes. The Worker calls the container, waits for the result, and returns it. Container stays warm for repeat calls within the window.

Document processing

Uploading a large PDF or Excel file to an agent? The Worker writes it to R2, then calls a container with the R2 object key. The container reads from R2, parses the document using Python libraries (pdfplumber, openpyxl), extracts structured data, writes the result back to R2, and the Worker picks it up. No timeouts, no memory pressure on the isolate.

Cost pattern

With sleepAfter = "10m" and typical 2–3 second per-document processing at basic instance type, the per-document cost is under $0.0001. The sleeping container costs nothing. Wake-up on demand keeps the experience snappy without paying for idle compute.

// Worker — calls Python container for document parsing
export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const { r2Key } = await req.json();

    // Route to document parser container (session by r2Key)
    const id = env.DOC_PARSER.idFromName(r2Key);
    const stub = env.DOC_PARSER.get(id);

    const result = await stub.fetch(new Request("http://container/parse", {
      method: "POST",
      body: JSON.stringify({ r2Key }),
      headers: { "Content-Type": "application/json" }
    }));

    return result;
  }
};

12 — SETUP GUIDE

SETUP GUIDE

Step 1 — Prerequisites

# Verify Docker is running
docker info

# Ensure you have the latest wrangler
npm install -g wrangler@latest
wrangler --version   # 3.x or higher required

# Confirm Workers Paid plan is active in your Cloudflare dashboard

Step 2 — Scaffold from template

npm create cloudflare@latest my-container-app \
  -- --template=cloudflare/templates/containers-template

cd my-container-app

Step 3 — Inspect the template structure

my-container-app/
├── Dockerfile         # Your container image
├── src/
│   └── index.ts       # Worker + Durable Object class
└── wrangler.toml      # [[containers]] configuration

Step 4 — Write your Dockerfile

# Example: Python FastAPI container
FROM python:3.12-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Step 5 — Configure wrangler.toml

name = "my-container-app"
main = "src/index.ts"
compatibility_date = "2026-01-01"

[[containers]]
class_name = "MyContainer"
image = "./Dockerfile"
instance_type = "basic"
max_instances = 5

[[durable_objects.bindings]]
name = "MY_CONTAINER"
class_name = "MyContainer"

[[migrations]]
tag = "v1"
new_sqlite_classes = ["MyContainer"]

Step 6 — Add secrets

npx wrangler secret put DATABASE_URL
npx wrangler secret put API_KEY

Step 7 — Deploy

# First deploy — takes several minutes for image pre-fetch
npx wrangler deploy

# Watch containers spin up
npx wrangler containers list

# Tail live logs
npx wrangler tail

First deploy is slow — expected

Cloudflare needs to pre-fetch your image to edge locations after the first deploy. Allow 3–5 minutes before the container is ready to serve requests. Subsequent deploys with unchanged images are much faster.

13 — PRICING

PRICING

Containers bill by active usage in 10-millisecond increments. Sleeping containers incur no charges. The Workers Paid plan ($5/month) is a prerequisite and includes free allocations that cover light container usage.

Resource	Free included	Beyond free tier
Memory (GiB-seconds)	25 GiB-hours/month	$0.0000025 per GiB-second
CPU (vCPU-seconds)	375 vCPU-minutes/month	$0.000020 per vCPU-second
Disk (GB-seconds)	200 GB-hours/month	$0.00000007 per GB-second
Egress — NA / Europe	1 TB/month	$0.025/GB
Egress — other regions	500 GB/month	$0.04/GB (Oceania/Korea: $0.05)
Image storage	Included up to 50 GB (beta)	TBD post-beta

Worked example — document parser at basic tier

Processing 10,000 documents/month, 3 seconds each, basic instance (0.25 vCPU, 512 MiB RAM):

Active time: 10,000 × 3s = 30,000 seconds

CPU:    0.25 vCPU × 30,000s = 7,500 vCPU-seconds
        Free: 22,500 vCPU-seconds → Covered by free tier

Memory: 0.5 GiB × 30,000s = 15,000 GiB-seconds
        Free: 90,000 GiB-seconds → Covered by free tier

Disk:   2 GB × 30,000s = 60,000 GB-seconds
        Free: 720,000 GB-seconds → Covered by free tier

Estimated container cost: ~$0 beyond Workers Paid plan base

Sleep is free

A sleeping container incurs zero container charges. The only cost during sleep is the underlying Durable Object storage — negligible. Configure sleepAfter aggressively for workloads with bursty traffic patterns.

14 — PLATFORM LIMITS

PLATFORM LIMITS

Beta-phase limits apply per account. These are expected to increase at GA.

Limit	Value (beta)	Notes
max_instances per container	20 (default)	Configurable; stopped instances don't count
Total vCPU across all active instances	1,500	Per account
Total memory across all active instances	6 TiB	Per account
Total disk across all active instances	30 TB	Per account
Image storage	50 GB	Per account; may change at GA
Max vCPU per instance	4.0	Custom sizing up to this limit
Max memory per instance	16 GiB	Minimum 3 GiB per vCPU
Max disk per instance	20 GB	All ephemeral — lost on restart
Architecture	linux/amd64 only	ARM not supported in beta
Cold start time	2–3 seconds typical	Depends on image size and edge location
SIGTERM grace window	15 minutes	Before SIGKILL on shutdown

Persistent storage

Container disk is fully ephemeral. For data that must survive restarts, write to R2 (object storage), D1 (SQLite), or KV from inside the container. Use the Cloudflare REST API or the Workers SDK imported via HTTP — the container has full outbound network access.

CLOUDFLARECONTAINERS

WHAT ARE CLOUDFLARE CONTAINERS

CONTAINERS VS WORKERS

HOW IT WORKS

Location selection

Session affinity

CONTAINER LIFECYCLE

Graceful shutdown

INSTANCE TYPES

WRANGLER.TOML CONFIGURATION

Referencing from Worker code

DEPLOYING CONTAINERS

What happens during wrangler deploy

Bringing an external image

IMAGE REGISTRY

NETWORKING

Port configuration

Environment variables

Container-to-container communication

USE CASES

HOW 2NTH.AI USES CONTAINERS

Python AI toolchains

Document processing

Cost pattern

SETUP GUIDE

Step 1 — Prerequisites

Step 2 — Scaffold from template

Step 3 — Inspect the template structure

Step 4 — Write your Dockerfile

Step 5 — Configure wrangler.toml

Step 6 — Add secrets

Step 7 — Deploy

PRICING

Worked example — document parser at basic tier

PLATFORM LIMITS

CLOUDFLARE
CONTAINERS

What happens during `wrangler deploy`