Multi-Agent Deployment
Docker setup
Section titled “Docker setup”Dockerfile
Section titled “Dockerfile”FROM python:3.12-slim
WORKDIR /appCOPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Persistent MLS state directory — mount as a volumeVOLUME /data/skytale
ENV SKYTALE_DATA_DIR=/data/skytale
CMD ["python", "agent.py"]docker-compose.yml
Section titled “docker-compose.yml”services: agent-alice: build: . environment: - SKYTALE_API_KEY=${SKYTALE_API_KEY} - SKYTALE_RELAY=https://relay.skytale.sh:5000 - SKYTALE_API_URL=https://api.skytale.sh volumes: - alice-data:/data/skytale restart: unless-stopped
agent-bob: build: . environment: - SKYTALE_API_KEY=${SKYTALE_API_KEY} - SKYTALE_RELAY=https://relay.skytale.sh:5000 - SKYTALE_API_URL=https://api.skytale.sh volumes: - bob-data:/data/skytale restart: unless-stopped
volumes: alice-data: bob-data:Environment variable reference
Section titled “Environment variable reference”| Variable | Description | Default |
|---|---|---|
SKYTALE_API_KEY | API key for authentication (sk_live_...) | — (required for production) |
SKYTALE_RELAY | Relay server URL | https://relay.skytale.sh:5000 |
SKYTALE_API_URL | API server URL | https://api.skytale.sh |
SKYTALE_DATA_DIR | Directory for MLS state persistence | ~/.skytale/<identity_hex> |
SKYTALE_MOCK | Enable mock mode (1, true, yes) | false |
SKYTALE_IDENTITY | Default agent identity (TypeScript SDK) | — |
Channel lifecycle
Section titled “Channel lifecycle”Bootstrap: creator agent
Section titled “Bootstrap: creator agent”One agent must create the channel first. This agent becomes the MLS group owner and processes join requests from other agents.
from skytale_sdk import SkytaleChannelManager
# The creator must run first and stay runningcreator = SkytaleChannelManager( identity=b"orchestrator", data_dir="/data/skytale",)creator.create("myorg/agents/tasks")
# Generate invite tokens for other agentstokens = []for i in range(5): token = creator.invite("myorg/agents/tasks", max_uses=1, ttl=3600) tokens.append(token)# Distribute tokens to joining agents (env vars, config files, API, etc.)Scaling: token-based join
Section titled “Scaling: token-based join”Other agents join using invite tokens. They don’t need to know each other — only the creator’s channel.
import osfrom skytale_sdk import SkytaleChannelManager
worker = SkytaleChannelManager( identity=b"worker-1", data_dir="/data/skytale",)
token = os.environ["SKYTALE_INVITE_TOKEN"]worker.join_with_token("myorg/agents/tasks", token)
# Now the worker can send and receive on the channelworker.send("myorg/agents/tasks", "worker-1 online")Token distribution patterns
Section titled “Token distribution patterns”| Pattern | When to use | How |
|---|---|---|
| Environment variable | Fixed agent set, known at deploy time | SKYTALE_INVITE_TOKEN in docker-compose |
| Config file | Agents read config on startup | JSON/TOML file mounted as a volume |
| API endpoint | Dynamic agent scaling | Creator exposes an endpoint that returns tokens |
| Shared store | Kubernetes or orchestrator-managed | Store tokens in a Secret or KV store |
Multi-use tokens
Section titled “Multi-use tokens”For auto-scaling scenarios where new agents spin up dynamically:
# Create a reusable token (up to 100 uses, valid for 24 hours)token = creator.invite("myorg/agents/tasks", max_uses=100, ttl=86400)# Store in a shared secret managerFailure recovery
Section titled “Failure recovery”Agent restart with persistent state
Section titled “Agent restart with persistent state”If an agent restarts but its data_dir is intact (volume-mounted), it can resume without rejoining:
# On restart, recreate the manager with the same identity and data_dirmgr = SkytaleChannelManager( identity=b"worker-1", data_dir="/data/skytale", # Volume-mounted, survived restart)# Channels are restored from local MLS state# No need to rejoin — just start sending/receivingmgr.send("myorg/agents/tasks", "worker-1 back online")Agent restart without state (data_dir lost)
Section titled “Agent restart without state (data_dir lost)”If the data_dir is lost, the agent must rejoin with a new invite token:
mgr = SkytaleChannelManager( identity=b"worker-1", data_dir="/data/skytale",)
# Need a fresh invite token from the channel ownernew_token = get_new_token_from_orchestrator()mgr.join_with_token("myorg/agents/tasks", new_token)Graceful shutdown
Section titled “Graceful shutdown”Always call close() before stopping an agent to cleanly shut down background threads:
import signalimport sys
def shutdown(signum, frame): mgr.close() sys.exit(0)
signal.signal(signal.SIGTERM, shutdown)signal.signal(signal.SIGINT, shutdown)Or use the context manager:
with SkytaleChannelManager(identity=b"agent") as mgr: mgr.create("org/ns/chan") run_agent_loop(mgr)# mgr.close() called automatically on exitprocess.on("SIGTERM", () => { mgr.close(); process.exit(0);});
process.on("SIGINT", () => { mgr.close(); process.exit(0);});Production checklist
Section titled “Production checklist”Storage
Section titled “Storage”-
data_diris mounted as a persistent volume (Docker volume, EBS, PVC) - Each agent has its own unique
data_dir— never shared - Backup strategy for
data_dirif channel state is critical - Volume permissions allow the agent process to read/write
Authentication
Section titled “Authentication”-
SKYTALE_API_KEYset via environment variable, not hardcoded - API key stored in a secrets manager (AWS Secrets Manager, Vault, K8s Secret)
- Separate API keys per environment (dev, staging, production)
Networking
Section titled “Networking”- Relay URL configured correctly (
SKYTALE_RELAY) - Outbound TCP 5000 (gRPC) and UDP 4433 (QUIC) allowed through firewall
- Health check:
curl https://relay.skytale.sh:5000/health
Reliability
Section titled “Reliability”- Graceful shutdown handler (
SIGTERM/SIGINT) callsmgr.close() - Container restart policy set to
unless-stoppedoralways - Error handling for all 5 exception types (see Error handling guide)
- Logging configured to capture
SkytaleErrorcodes for monitoring
Agent identity
Section titled “Agent identity”- Each agent has a unique, stable
identityacross restarts - Identity is deterministic (not randomly generated on each start)
- No two running agents share the same identity
Common pitfalls
Section titled “Common pitfalls”Losing data_dir
Section titled “Losing data_dir”The most common production issue. Without persistent MLS state, agents cannot decrypt messages on existing channels. Always use a volume mount.
# docker-compose.yml — WRONG: no volumeservices: agent: build: . # data_dir defaults to a temp path — lost on container restart
# docker-compose.yml — CORRECT: persistent volumeservices: agent: build: . volumes: - agent-data:/data/skytale environment: - SKYTALE_DATA_DIR=/data/skytaleAgents on separate machines need invite tokens
Section titled “Agents on separate machines need invite tokens”Agents cannot join channels by just knowing the channel name. The MLS protocol requires a cryptographic handshake mediated by invite tokens. There is no “open” channel that anyone can join.
# WRONG: trying to join without a tokenbob.create("org/ns/chan") # This creates a NEW channel, not joins Alice's
# CORRECT: use invite token from the channel ownertoken = alice.invite("org/ns/chan")# Send token to Bob (env var, API call, config file, etc.)bob.join_with_token("org/ns/chan", token)Sharing data_dir between agents
Section titled “Sharing data_dir between agents”Each agent identity must have its own data_dir. Sharing causes MLS epoch conflicts and decryption failures.
# WRONG: shared volumeservices: agent-1: volumes: - shared-data:/data/skytale # Both agents write to same dir agent-2: volumes: - shared-data:/data/skytale # MLS state conflicts
# CORRECT: separate volumesservices: agent-1: volumes: - agent1-data:/data/skytale agent-2: volumes: - agent2-data:/data/skytaleRandom identity on restart
Section titled “Random identity on restart”If your agent generates a random identity on each start, it creates a new MLS participant every time. Use a stable, deterministic identity:
# WRONG: random identityimport osmgr = SkytaleChannelManager(identity=os.urandom(16))
# CORRECT: stable identitymgr = SkytaleChannelManager(identity=b"order-processor-1")