Multi-Agent Deployment

Docker setup

Dockerfile

FROM python:3.12-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Persistent MLS state directory — mount as a volume
VOLUME /data/skytale

ENV SKYTALE_DATA_DIR=/data/skytale

CMD ["python", "agent.py"]

docker-compose.yml

services:
  agent-alice:
    build: .
    environment:
      - SKYTALE_API_KEY=${SKYTALE_API_KEY}
      - SKYTALE_RELAY=https://relay.skytale.sh:5000
      - SKYTALE_API_URL=https://api.skytale.sh
    volumes:
      - alice-data:/data/skytale
    restart: unless-stopped

  agent-bob:
    build: .
    environment:
      - SKYTALE_API_KEY=${SKYTALE_API_KEY}
      - SKYTALE_RELAY=https://relay.skytale.sh:5000
      - SKYTALE_API_URL=https://api.skytale.sh
    volumes:
      - bob-data:/data/skytale
    restart: unless-stopped

volumes:
  alice-data:
  bob-data:

Environment variable reference

Variable	Description	Default
`SKYTALE_API_KEY`	API key for authentication (`sk_live_...`)	— (required for production)
`SKYTALE_RELAY`	Relay server URL	`https://relay.skytale.sh:5000`
`SKYTALE_API_URL`	API server URL	`https://api.skytale.sh`
`SKYTALE_DATA_DIR`	Directory for MLS state persistence	`~/.skytale/<identity_hex>`
`SKYTALE_MOCK`	Enable mock mode (`1`, `true`, `yes`)	`false`
`SKYTALE_IDENTITY`	Default agent identity (TypeScript SDK)	—

Channel lifecycle

Bootstrap: creator agent

One agent must create the channel first. This agent becomes the MLS group owner and processes join requests from other agents.

from skytale_sdk import SkytaleChannelManager

# The creator must run first and stay running
creator = SkytaleChannelManager(
    identity=b"orchestrator",
    data_dir="/data/skytale",
)
creator.create("myorg/agents/tasks")

# Generate invite tokens for other agents
tokens = []
for i in range(5):
    token = creator.invite("myorg/agents/tasks", max_uses=1, ttl=3600)
    tokens.append(token)
# Distribute tokens to joining agents (env vars, config files, API, etc.)

Scaling: token-based join

Other agents join using invite tokens. They don’t need to know each other — only the creator’s channel.

import os
from skytale_sdk import SkytaleChannelManager

worker = SkytaleChannelManager(
    identity=b"worker-1",
    data_dir="/data/skytale",
)

token = os.environ["SKYTALE_INVITE_TOKEN"]
worker.join_with_token("myorg/agents/tasks", token)

# Now the worker can send and receive on the channel
worker.send("myorg/agents/tasks", "worker-1 online")

Token distribution patterns

Pattern	When to use	How
Environment variable	Fixed agent set, known at deploy time	`SKYTALE_INVITE_TOKEN` in docker-compose
Config file	Agents read config on startup	JSON/TOML file mounted as a volume
API endpoint	Dynamic agent scaling	Creator exposes an endpoint that returns tokens
Shared store	Kubernetes or orchestrator-managed	Store tokens in a Secret or KV store

Multi-use tokens

For auto-scaling scenarios where new agents spin up dynamically:

# Create a reusable token (up to 100 uses, valid for 24 hours)
token = creator.invite("myorg/agents/tasks", max_uses=100, ttl=86400)
# Store in a shared secret manager

Failure recovery

Agent restart with persistent state

If an agent restarts but its data_dir is intact (volume-mounted), it can resume without rejoining:

# On restart, recreate the manager with the same identity and data_dir
mgr = SkytaleChannelManager(
    identity=b"worker-1",
    data_dir="/data/skytale",  # Volume-mounted, survived restart
)
# Channels are restored from local MLS state
# No need to rejoin — just start sending/receiving
mgr.send("myorg/agents/tasks", "worker-1 back online")

Agent restart without state (data_dir lost)

If the data_dir is lost, the agent must rejoin with a new invite token:

mgr = SkytaleChannelManager(
    identity=b"worker-1",
    data_dir="/data/skytale",
)

# Need a fresh invite token from the channel owner
new_token = get_new_token_from_orchestrator()
mgr.join_with_token("myorg/agents/tasks", new_token)

Graceful shutdown

Always call close() before stopping an agent to cleanly shut down background threads:

Python
TypeScript

import signal
import sys

def shutdown(signum, frame):
    mgr.close()
    sys.exit(0)

signal.signal(signal.SIGTERM, shutdown)
signal.signal(signal.SIGINT, shutdown)

Or use the context manager:

with SkytaleChannelManager(identity=b"agent") as mgr:
    mgr.create("org/ns/chan")
    run_agent_loop(mgr)
# mgr.close() called automatically on exit

process.on("SIGTERM", () => {
  mgr.close();
  process.exit(0);
});

process.on("SIGINT", () => {
  mgr.close();
  process.exit(0);
});

Production checklist

Storage

data_dir is mounted as a persistent volume (Docker volume, EBS, PVC)
Each agent has its own unique data_dir — never shared
Backup strategy for data_dir if channel state is critical
Volume permissions allow the agent process to read/write

Authentication

SKYTALE_API_KEY set via environment variable, not hardcoded
API key stored in a secrets manager (AWS Secrets Manager, Vault, K8s Secret)
Separate API keys per environment (dev, staging, production)

Networking

Relay URL configured correctly (SKYTALE_RELAY)
Outbound TCP 5000 (gRPC) and UDP 4433 (QUIC) allowed through firewall
Health check: curl https://relay.skytale.sh:5000/health

Reliability

Graceful shutdown handler (SIGTERM / SIGINT) calls mgr.close()
Container restart policy set to unless-stopped or always
Error handling for all 5 exception types (see Error handling guide)
Logging configured to capture SkytaleError codes for monitoring

Agent identity

Each agent has a unique, stable identity across restarts
Identity is deterministic (not randomly generated on each start)
No two running agents share the same identity

Common pitfalls

Losing `data_dir`

The most common production issue. Without persistent MLS state, agents cannot decrypt messages on existing channels. Always use a volume mount.

# docker-compose.yml — WRONG: no volume
services:
  agent:
    build: .
    # data_dir defaults to a temp path — lost on container restart

# docker-compose.yml — CORRECT: persistent volume
services:
  agent:
    build: .
    volumes:
      - agent-data:/data/skytale
    environment:
      - SKYTALE_DATA_DIR=/data/skytale

Agents on separate machines need invite tokens

Agents cannot join channels by just knowing the channel name. The MLS protocol requires a cryptographic handshake mediated by invite tokens. There is no “open” channel that anyone can join.

# WRONG: trying to join without a token
bob.create("org/ns/chan")  # This creates a NEW channel, not joins Alice's

# CORRECT: use invite token from the channel owner
token = alice.invite("org/ns/chan")
# Send token to Bob (env var, API call, config file, etc.)
bob.join_with_token("org/ns/chan", token)

Sharing `data_dir` between agents

Each agent identity must have its own data_dir. Sharing causes MLS epoch conflicts and decryption failures.

# WRONG: shared volume
services:
  agent-1:
    volumes:
      - shared-data:/data/skytale  # Both agents write to same dir
  agent-2:
    volumes:
      - shared-data:/data/skytale  # MLS state conflicts

# CORRECT: separate volumes
services:
  agent-1:
    volumes:
      - agent1-data:/data/skytale
  agent-2:
    volumes:
      - agent2-data:/data/skytale

Random identity on restart

If your agent generates a random identity on each start, it creates a new MLS participant every time. Use a stable, deterministic identity:

# WRONG: random identity
import os
mgr = SkytaleChannelManager(identity=os.urandom(16))

# CORRECT: stable identity
mgr = SkytaleChannelManager(identity=b"order-processor-1")