Organized AI · Educational Series

OpenClaw Architecture

Learn the OpenClaw personal AI assistant from the ground up — Gateway, channels, agents, skills, nodes, Tailscale, and security. A phased, interactive learning journey. The lobster way. 🦞

6 Phases

20+ Channels

3 Learning Modes

Node.js 22.16+ Required

Phase 1 — The Gateway

The single WebSocket control plane at :18789. Sessions, channel adapters, tool routing, event streaming, config hot-reload.

ws://127.0.0.1:18789 openclaw.json Control UI openclaw doctor

→

Phase 2 — Channels

20+ messaging adapters. DM pairing policy. Group routing. One internal protocol. The Pi agent never knows which channel sent the message.

Telegram Slack Discord WhatsApp dmPolicy: pairing

→

Phase 3 — Agents + Sessions

The Pi agent in RPC mode. The 6-step agent loop. Main vs. group sessions. Thinking levels. Agent-to-agent communication via sessions_* tools.

Pi agent RPC AGENTS.md SOUL.md sessions_send /compact /new /think

→

Phase 4 — Skills + Nodes

Skills = what the agent knows. Nodes = what the agent can do. ClawHub, workspace skills, macOS/iOS/Android companion apps, and node.invoke.

ClawHub SKILL.md macOS node iOS Voice Wake node.invoke

→

Phase 5 — Tailscale + Security

Gateway runs on loopback by design. Tailscale Serve for tailnet access, Funnel for public. Six security layers. Docker sandboxing. Default-safe, opt-in to exposure.

Tailscale Serve Tailscale Funnel Docker sandbox openclaw doctor

→

Phase 6 — Deployment

Deploy to Cloudflare Pages, run a remote Linux gateway, Docker compose, or Nix. Your own version of this doc site from what you built.

Cloudflare Pages Docker compose Nix pnpm deploy

→

Quick Start

# Clone and run → git clone https://github.com/Organized-AI/openclaw-education → cd openclaw-education → pnpm install → pnpm setup ✓ Mode selected: Full Hands-On → pnpm phase:1 # The Gateway → pnpm phase:2 # Channels → pnpm phase:3 # Agents + Sessions → pnpm phase:4 # Skills + Nodes → pnpm phase:5 # Tailscale + Security → pnpm phase:6 # Deployment + Doc Site → pnpm deploy

Learning Modes

Docs Only

Browse architecture diagrams, phase breakdowns, and concept explainers. No OpenClaw install needed. Just Node.js + pnpm.

★ Diagrams + Docs

Generate real Mermaid diagrams and annotated examples. OpenClaw optional. The recommended starting point for workshops.

Full Hands-On

Live Gateway + channel exercises + diagrams. Connect real channels on your machine. OpenClaw required.

Key Concepts

01 — Gateway

WebSocket server at :18789. Not the AI — the router. Think post office, not postman.

02 — Channels

20+ adapters normalize inbound messages into one OpenClaw format. The agent never knows the difference.

03 — Pi Agent

Runs the loop: receive → think → tools → stream reply. Connects via RPC. Channel-agnostic.

04 — Sessions

main = your 1:1. Groups get isolated sessions. Each carries model, history, usage stats.

05 — Skills

Markdown files injected into agent context. Three tiers: bundled, managed (ClawHub), workspace.

06 — Nodes

macOS/iOS/Android companion apps exposing device capabilities via node.invoke.

Prerequisites

Tool	Version	Required	Notes
Node.js	22.16+ or 24	✓ Required	Runtime for all phase scripts
pnpm	8+	✓ Required	`npm install -g pnpm`
Git	any	✓ Required	Clone and commit phase outputs
OpenClaw	latest	★ Hands-On	`npm install -g openclaw@latest`
Tailscale	any	◌ Optional	Phase 5 remote exercises
Cloudflare	—	◌ Optional	Phase 6 deployment

Full Architecture

Messaging Surfaces — 20+ channels ────────────────────────────────────────────────────────────────────────────── WhatsApp Telegram Slack Discord Signal iMessage Matrix IRC +13 │ │ │ │ │ │ │ │ └─────────┴──────────┴─────────┴──────────┴──────────┴──────────┴────────┘ │ Channel Adapters (each runs inside the Gateway process) │ ▼ ┌──────────────────────────────────────────────────────────────────────────┐ │ GATEWAY │ │ ws://127.0.0.1:18789 │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │ │ │ Sessions │ │ Tool Router │ │ Event Stream│ │ Config │ │ │ │ main+groups │ │ browser │ │ typing │ │ hot-reload │ │ │ │ per-thread │ │ canvas/cron │ │ presence │ │ openclaw │ │ │ │ │ │ nodes/hooks │ │ media/usage │ │ .json │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └──────────────┘ │ └──────────────────────────────┬───────────────────────────────────────────┘ │ ┌─────────────────────┼────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────────────┐ │ PI AGENT │ │ CLI │ │ Companion Nodes │ │ (RPC mode) │ │ openclaw agent │ │ macOS — menu bar │ │ │ │ openclaw send │ │ iOS — Voice Wake │ │ think → tools │ │ openclaw doctor │ │ Android — full node │ │ → stream reply │ │ │ │ system.run/notify │ └────────┬────────┘ └─────────────────┘ └──────────────────────┘ │ Agent Workspace Tailscale layer (optional) ~/.openclaw/workspace/ Serve → tailnet-only access AGENTS.md SOUL.md TOOLS.md Funnel → public + password auth skills/ (ClawHub + workspace)

Phase 1

The Gateway

The single WebSocket control plane that powers all of OpenClaw. It is not the AI. It is the router.

pnpm phase:1

Why this exists

Without a Gateway, every channel would have to talk directly to the AI.

Picture trying to let Telegram, Slack, Discord, WhatsApp, Signal, iMessage and 15 other services each learn how to talk to an AI model. Each one has its own quirks — different message formats, different login systems, different ways of handling attachments. Teaching all of them the same AI dialect would be a maintenance nightmare.

The Gateway solves this by sitting in the middle. Channels talk to the Gateway in their own native language. The Gateway translates everything into one shared internal format, hands that to the AI, and translates the AI's reply back. Add a new channel? You only teach it the Gateway's language — the AI never has to change.

What Is the Gateway?

A local WebSocket server at ws://127.0.0.1:18789 that receives messages from channels, routes them to the Pi agent, and delivers replies back through the correct channel adapter.

In plain terms

WebSocket is a way for two programs to keep a phone line open and chat back and forth in real time. Unlike a normal web request (ask → answer → hang up), a WebSocket stays connected, so the Gateway can stream the AI's reply word-by-word as it's being written.

127.0.0.1 (and its nickname loopback) means "this computer only." Nothing on the outside internet can reach it. It's like a phone extension that only rings inside your house.

:18789 is the port number — the specific "door" on your computer where the Gateway is listening. Other programs knock on that door to connect.

Mental model: The Gateway is the post office. The Pi agent is the letter writer. The channels are the mailboxes. The post office does not write letters — it routes them.

What Connects to the Gateway

Channel Adapters

In-process adapters that pipe inbound messages in and deliver replies out. One per connected channel.

Pi Agent (RPC)

Runs the agent loop over WebSocket. Streams replies token by token.

In plain terms

RPC (Remote Procedure Call) is a fancy phrase for "one program asks another program to run a function, as if it were its own." So when the Gateway says "think about this message," the Pi agent runs its thinking, and the answer comes back as if the Gateway had done it itself. The two programs just happen to live at different ends of a WebSocket.

Streaming token by token means you see the reply appear word-by-word (actually piece-by-piece — a "token" is roughly ¾ of a word) instead of waiting for the whole answer to arrive. Same reason ChatGPT feels live.

CLI Tools

openclaw agent, openclaw send — connect as WS clients.

Companion Nodes

macOS, iOS, Android apps connect as WS clients to expose device capabilities.

Control UI + WebChat

Browser-based session management at http://127.0.0.1:18789.

What the Gateway Manages

SessionsOne per conversation thread, persist across restarts

Channel AdaptersOne per connected channel, normalizes native format

Tool RoutingRoutes tool calls to browser, Canvas, cron, webhooks, nodes

Event StreamingTyping, presence, media pipeline, usage tracking

Config Hot-ReloadWatches ~/.openclaw/openclaw.json, applies without restart

In plain terms

Hot-reload means the Gateway is constantly glancing at the config file. The moment you save a change, it picks up the new settings — no stopping and starting the program. It's like editing your phone's ringtone while the phone stays on; you don't need to reboot.

Walk-through

A Telegram message's 7-second journey

t+0.0s Your friend types "what's on my calendar today?" into your Telegram bot and hits send.
t+0.1s Telegram's servers deliver the message to the Telegram adapter running inside your Gateway. The adapter strips away Telegram-specific fields and produces a plain OpenClaw message struct.
t+0.2s The Gateway looks up the session for this thread (or creates one). History, model preference and usage counters come along for the ride.
t+0.3s Gateway sends an RPC call over WebSocket to the Pi agent: "here's the message, here's the session context."
t+0.5s The Pi agent loads AGENTS.md + SOUL.md + active skills, reasons about what to do, and decides it needs to call the calendar tool.
t+1.2s Tool result comes back. Agent starts generating a reply.
t+1.4s First token streams back to the Gateway → Telegram adapter → shown as "typing…" then as partial text in the chat.
t+6.8s Last token arrives. Session is updated with the new history, token count, and cost. Your friend reads the answer.

Gateway Architecture

Inbound — 20+ channels WhatsApp Telegram Slack Discord Signal iMessage Matrix +13 more │ │ │ │ │ │ │ └─────────┴────────┴────────┴─────────┴─────────┴─────────┘ │ Channel Adapters (normalize to OpenClaw struct) │ ▼ ┌──────────────────────────────────────────────────────────┐ │ GATEWAY :18789 │ │ Sessions │ Tool Router │ Event Stream │ Config Reload │ └───────────────────────────┬──────────────────────────────┘ │ RPC over WebSocket │ ▼ ┌──────────────────────────────────────────────────────────┐ │ PI AGENT — 6-step loop │ │ 1. Receive 2. Load context 3. Think │ │ 4. Tool calls 5. Stream reply 6. Update session │ └──────────────────────────────────────────────────────────┘

Config Reference — openclaw.json

The Gateway reads all configuration from ~/.openclaw/openclaw.json. It hot-reloads most settings without a restart.

Minimal Config

{ "agent": { "model": "anthropic/claude-sonnet-4-6" } }

Full Config Anatomy

Section	Key Fields	Notes
agent	model, thinkingLevel	AI model and reasoning depth
channels.telegram	botToken, allowFrom, dmPolicy	Create bot via @BotFather
channels.slack	botToken, appToken	Socket Mode required
channels.discord	token	Message Content Intent required
gateway	bind, port, auth, tailscale	bind always "loopback"
agents.defaults	workspace, sandbox.mode	sandbox: "none" or "non-main"
browser	enabled, color	Browser automation toggle

Critical Rules

gateway.bind = "loopback" — always. Never 0.0.0.0. Use Tailscale for remote access.

gateway.tailscale.mode = off / serve / funnel
channels.*.dmPolicy = "pairing" (recommended) or "open"
agents.defaults.sandbox.mode = "none" (main) or "non-main" (Docker for groups)

Common mistakes

Setting bind: "0.0.0.0" to "make it work on my network." That exposes the Gateway to every device on your LAN with zero auth. Use Tailscale Serve instead — same convenience, proper identity check.
Editing openclaw.json inside a text editor that writes atomically. Some editors (nano with certain settings, some IDEs) briefly delete-then-recreate the file. Hot-reload can miss that. If config changes aren't taking, save a second time or run openclaw doctor.
Assuming the Gateway is the AI. It isn't. If the AI gives a bad answer, look at AGENTS.md and the session — the Gateway just moved the message. Debugging at the wrong layer wastes hours.
Running two Gateways on the same port. The second one silently fails to bind. Check with lsof -i :18789 if a channel won't connect.

Phase 2

Channels

20+ messaging channels. One internal protocol. The Pi agent never knows which channel a message came from.

pnpm phase:2

Why this exists

Messaging apps are wildly different under the hood.

Telegram speaks Markdown. Slack wants JSON "blocks" with structured layout. Discord uses embeds and rich cards. WhatsApp (via Baileys) operates through reverse-engineered protocols that barely look like a normal API. If the AI had to know all of this, it would spend more energy on formatting than thinking.

Adapters are tiny translators. Each one speaks one channel's dialect on the outside and the same OpenClaw format on the inside. That lets you add a new messenger without teaching the AI a single new trick — and lets the AI behave identically no matter where the human typed the message.

Channel Routing

Each channel adapter runs inside the Gateway process. It normalizes inbound messages into OpenClaw's internal format and translates outbound replies back to each channel's native format. Telegram gets Markdown. Slack gets Block Kit. Discord gets embeds. The agent never knows the difference.

In plain terms

Adapter = a plug converter. Your laptop charger is the same no matter which country you travel to; the wall adapter changes. The AI is your laptop. Telegram, Slack and Discord are different wall sockets.

Normalize means "put it in a standard shape." A Telegram message and a Discord message come in with different field names, different timestamp formats, different ways to reference the sender. Normalizing rewrites them so downstream code only sees one shape: { sender, thread, text, attachments, timestamp }.

DM stands for Direct Message — a 1:1 conversation (not a group channel).

Routing Path

1. User sends messageOn any channel (Telegram, Slack, etc.)

2. Channel adapter receivesgrammY, Bolt, discord.js, Baileys...

3. Normalize to OpenClaw structCommon format regardless of source

4. Gateway finds/creates sessionBased on thread ID + channel

5. Dispatch to Pi agentRPC over WebSocket

6. Agent streams replyToken by token

7. Route back through adapterTranslate to channel-native format

Supported Channels

Baileys adapter. QR code login via openclaw channels login.

grammY adapter. Create bot via @BotFather, add botToken to config.

Slack

Bolt adapter. Socket Mode required. Install to workspace with bot scopes.

Discord

discord.js adapter. Message Content Intent required.

Signal

signal-cli adapter. Local Signal client required.

iMessage

BlueBubbles adapter. Requires BlueBubbles Server on macOS.

Plus: Google Chat, Microsoft Teams, Matrix, IRC, LINE, Mattermost, Nostr, Tlon, Twitch, Zalo, WeChat, WebChat, and more.

Walk-through

A Slack thread's round trip

step 1 A teammate writes "@OpenClaw summarize yesterday's incident thread" in #ops on Slack.
step 2 Slack fires a Socket Mode event to the Bolt adapter (the piece of code that speaks Slack). Bolt hands the raw event to OpenClaw.
step 3 The adapter normalizes: it extracts the text, resolves the thread ID, strips the @mention, and builds a generic OpenClaw message.
step 4 Gateway finds the group session keyed to this thread (or creates one with a fresh history). Group sessions are separate from your personal main session — your teammates can't see your private chats.
step 5 Pi agent thinks. Streams back a reply as plain text plus optional Markdown.
step 6 The Bolt adapter converts that into Block Kit — Slack's structured format for headings, bullets, dividers — and posts it in the thread.
step 7 The agent has no memory that this came from Slack. If the same question came from Discord tomorrow, the logic would be identical; only the output format would differ.

DM Pairing Policy

The default security model for direct messages. Unknown senders get a pairing code — the bot ignores them until you approve.

DM Pairing Flow Inbound DM arrives │ ▼ Sender in allowFrom list? │ ┌────┴────┐ │ │ YES NO │ │ ▼ ▼ Create dmPolicy? Session │ ┌───┴───┐ │ │ "open" "pairing" │ │ ▼ ▼ Create Send pairing code Session Wait for approval │ ▼ openclaw pairing approve telegram <code> │ ▼ Add to allowFrom → Create Session

Channel Setup Guides

Create bot via @BotFather on Telegram
Add botToken to channels.telegram in openclaw.json
Restart gateway

Slack

Create app at api.slack.com
Enable Socket Mode
Add bot scopes (chat:write, app_mentions:read, etc.)
Install to workspace

Discord

Create bot at Discord Developer Portal
Enable Message Content Intent
Invite with bot + applications.commands scopes

→ openclaw channels login Scan QR code with WhatsApp mobile app

Verify

→ openclaw doctor ✓ Gateway running ✓ Telegram connected ✓ Slack connected

Key insight: The Gateway is channel-agnostic at the session layer. Adding a new channel is config-only — no code changes, no agent modifications.

Common mistakes

Forgetting Discord's "Message Content Intent" toggle. Discord ships bots blind to message text by default (privacy-first). If your bot only reacts to slash commands and ignores plain replies, this is why. Flip the intent in the Discord Developer Portal.
Leaving dmPolicy: "open" in production. Anyone who finds your bot's handle can DM it. The pairing code flow costs one extra message but blocks random senders.
Publishing a bot's username before configuring allowFrom. Indexers and Telegram bot directories pick up new bots quickly. Configure pairing before the bot goes live.
Expecting real-time group chat without requireMention: true. Without it, the bot tries to respond to every message in a busy channel — annoying at best, spam-banned at worst.
Hitting Slack rate limits during testing. Slack caps apps at ~1 msg/sec per channel. If streaming tokens arrive faster, they get queued or dropped. The Bolt adapter buffers automatically, but be patient.

Phase 3

Agents + Sessions

The Pi agent connects to the Gateway in RPC mode, receives messages, runs the agent loop, and streams replies.

pnpm phase:3

Why this exists

An AI model by itself is amnesia plus zero hands.

Raw language models don't remember anything between messages. They can't click buttons, open files, or run commands. Ask a model directly "what's on my calendar?" and it will make something up — it has no way to actually look.

The Pi agent wraps the model in a loop that gives it memory (sessions), tools (real functions it can call), and personality (AGENTS.md + SOUL.md). Sessions mean every conversation picks up where it left off. Tools mean it can actually do things. The 6-step loop is just the choreography that ties those together.

Agent Architecture

In plain terms

Context window = how much text the model can "see" at once. Think of it as a whiteboard with a fixed size. Every message, every tool result, every instruction takes up space. When the whiteboard fills up, something has to be erased or summarized — that's what /compact does.

Token ≈ a chunk of text, usually ¾ of a word. "hello" is 1 token; "antidisestablishmentarianism" is ~6. Models are billed per token in and per token out. That's why longer conversations cost more.

Thinking level controls how much the model reasons privately before replying. Higher = smarter answers but slower and pricier. "off" means no extra reasoning; "xhigh" means the model may spend thousands of internal tokens planning before it writes a single word you see.

The Agent Loop (6 Steps)

Pi Agent — RPC Mode 1. RECEIVE message + session context from Gateway │ ▼ 2. LOAD prompt context: AGENTS.md + SOUL.md + TOOLS.md + active skills + session history │ ▼ 3. THINK optional reasoning: off │ minimal │ low │ medium │ high │ xhigh │ ▼ 4. TOOLS execute tool calls: browser │ canvas │ cron │ system.run sessions_* │ node.invoke │ ▼ 5. STREAM reply token by token → Gateway → Channel │ ▼ 6. UPDATE append history, update token count + cost prune if context window exceeded

Agent Workspace

Located at ~/.openclaw/workspace/. These files define your agent's personality and capabilities:

File	Purpose
AGENTS.md	Your instructions — applies universally across all channels
SOUL.md	Personality, values, tone of voice
TOOLS.md	Tool descriptions and usage guidance
skills/<name>/SKILL.md	Per-skill instructions (see Phase 4)

Key insight: The agent does not know which channel it is responding to. AGENTS.md applies universally. This is by design — your assistant behaves consistently whether the message came from Telegram, Slack, or Discord.

Session Model

Session Types

main

Your personal 1:1 with the assistant. Full tool access. No sandboxing by default. One per Gateway instance.

Group Sessions

One per group chat thread. Keyed by group ID. Can be sandboxed independently via Docker containers.

Session Fields

Field	Description
model	Which AI model to use for this session
thinkingLevel	off / minimal / low / medium / high / xhigh
verboseLevel	Controls response detail level
sendPolicy	Controls message sending behavior
groupActivation	How the agent activates in group chats
history	Conversation history for context
tokenCount	Tokens used in this session
cost	Running cost for this session

Chat Commands

Command	What It Does
/new	Wipe history entirely — fresh session
/compact	Summarize and compress history
/think <level>	Change thinking depth for this session
/verbose	Toggle verbose output
/usage	Show token usage and costs
/status	Show session status
/activation	Configure group activation mode

Agent-to-Agent Communication

The sessions_* tools enable multi-agent coordination:

sessions_list — all active sessions with metadata
sessions_history — transcript of a specific session
sessions_send — message another session

Walk-through

What happens when you type `/compact`

step 1 You've been chatting for two hours. The session's history field has ballooned to 40,000 tokens. The model's context window only holds 200,000, but every new message is now expensive — and the oldest bits aren't useful anymore.
step 2 You type /compact. The Gateway recognizes this as a chat command (not a prompt for the AI) and pauses normal routing.
step 3 It asks the model to summarize the conversation: "produce a dense recap of everything above so we can continue coherently." This uses tokens, but far fewer than you're about to save.
step 4 The returned summary (maybe 2,000 tokens) replaces the old history. Your next message rides into the model with a fresh, tight context.
step 5 Critical details (names, decisions, files opened) survive in the summary. Small talk and false starts are gone. Cost per turn drops sharply.
step 6 Compare with /new, which throws history away entirely — faster and cheaper, but the agent forgets everything. Use /compact for long projects, /new when switching topics.

Common mistakes

Cranking thinkingLevel to xhigh for every question. You pay for every thinking token even when the question is trivial. Use medium as a default; bump up for genuinely hard problems.
Writing a 3,000-word AGENTS.md. Every instruction fights for context space. Shorter, sharper rules beat sprawling manifestos — the model actually follows them.
Treating group sessions like main. Group sessions are sandboxed and have fewer tools by default. If a group agent "can't do" something you can do in DM, check the sandbox config before blaming the agent.
Surprise at cost spikes after a long debugging session. Every re-try reprocesses the whole history. Run /usage occasionally; run /compact when it climbs.
Editing AGENTS.md mid-conversation and expecting instant effect. Changes apply on the next turn — not retroactively to messages already in history.

Phase 4

Skills + Nodes

Skills = what the agent knows. Nodes = what the agent can do.

pnpm phase:4

Why this exists

Knowing something and being able to do it are different problems.

You could teach the agent a new workflow by retraining the model — but that takes weeks, GPUs, and huge datasets. Or you could let the agent run arbitrary shell commands on your laptop — but that's a security nightmare. Neither works for everyday users.

OpenClaw splits the problem: Skills are just text (markdown files) that get dropped into the model's context so it learns how to handle specific situations. Nodes are tiny companion apps on your devices that expose safe, explicit capabilities (take a photo, get location, run this specific command). The agent reads a skill to know what to do, then calls a node to actually do it.

Skills Platform

Skills are markdown files injected into the agent's context. They teach the agent specific workflows without changing the model. Think employee training manuals, not model fine-tuning.

In plain terms

Markdown is just plain text with light formatting (# for headings, * for bullets). Skills are literally markdown files you can edit with any text editor.

Injected into context means the skill's text is automatically pasted into the model's "whiteboard" before it starts thinking. The model reads it the same way it reads your message. No compile step, no magic — just text the model sees.

Fine-tuning (what this isn't) means permanently altering the model's weights — expensive, slow, and risky. Skills are reversible by deleting a file.

Three Skill Tiers

Bundled

Ship with OpenClaw. Always available. Includes: browser automation, Canvas, cron jobs.

Managed (ClawHub)

Community registry at clawhub.com. Install on demand. Curated and versioned.

Workspace

Your skills at ~/.openclaw/workspace/skills/<name>/SKILL.md. Active immediately, no restart.

Creating a Workspace Skill

# Create a custom skill → mkdir -p ~/.openclaw/workspace/skills/gtm-audit → cat > ~/.openclaw/workspace/skills/gtm-audit/SKILL.md << 'EOF' # GTM Tag Audit When asked to audit GTM tags: 1. Open the GTM container in the browser 2. List all tags with their trigger conditions 3. Check for duplicate or conflicting tags 4. Verify conversion tracking setup 5. Report findings in a structured table EOF # Active immediately — no restart needed

Nodes

Nodes are companion apps (macOS, iOS, Android) that pair with the Gateway via WebSocket and expose device-local capabilities. The agent invokes nodes via node.invoke — it never touches devices directly.

In plain terms

Companion app = a separate app installed on your device (menu bar on Mac, full app on phone) that listens for Gateway requests. Without it, the agent can't touch your camera, your GPS, your shell. With it, every capability is explicit and scoped — the phone's camera doesn't get exposed unless you install and authorize the iOS node.

node.invoke is the single tool the agent uses to call any device action: node.invoke("camera.snap", { side: "rear" }). The agent doesn't open cameras directly; it asks the node to do it and trusts the result.

macOS Node

Action	Description
system.run	Execute shell commands on the host machine
system.notify	macOS native notifications
canvas.*	Canvas drawing and manipulation
camera.snap	Take photos with connected camera
screen.record	Screen recording
location.get	Device location

Elevated mode: type /elevated on for host permissions.

iOS Node

Voice Wake, Talk Mode, Canvas, camera, screen recording, Bonjour pairing.

Android Node

Chat + Voice + Canvas, camera, screen recording, notifications, location, SMS, contacts, calendar.

Node Invocation Flow

node.invoke — how the agent uses devices Pi Agent │ │ node.invoke("system.run", { cmd: "..." }) │ ▼ ┌──────────────┐ │ GATEWAY │ ─── routes to registered node └──────┬───────┘ │ ▼ ┌──────────────────┐ │ macOS Node │ ─── executes on host │ (menu bar app) │ └──────┬───────────┘ │ │ { result: "...", exitCode: 0 } │ ▼ Pi Agent ─── processes result, continues loop

Key insight: Skills = what the agent knows. Nodes = what the agent can do. A skill can orchestrate multiple node capabilities — e.g., a "morning briefing" skill that uses camera.snap, location.get, and system.notify together.

Walk-through

A "Morning Briefing" skill in action (7:00 AM)

step 1 A cron trigger fires at 7:00 AM. Gateway wakes the Pi agent with: "run the morning briefing."
step 2 Agent reads skills/morning-briefing/SKILL.md — your custom markdown file that says things like: "Get location. Pull weather. Summarize today's calendar. Snap a photo of the whiteboard in my office. Send everything to the macOS notify + save to Canvas."
step 3 Agent calls node.invoke("location.get") on the iOS node. The phone app returns GPS coords.
step 4 Agent calls a weather tool with those coords. Gets forecast.
step 5 Agent calls a calendar tool (via a bundled skill) to fetch today's events.
step 6 Agent calls node.invoke("camera.snap", { device: "office-mac" }). The macOS node captures a photo, uploads it, returns a URL.
step 7 Agent composes a briefing, writes it to Canvas, and calls node.invoke("system.notify", { title: "Morning Briefing", body: "..." }). You see it on your Mac within seconds of waking up.
step 8 None of this required new code. You edited one markdown file. The agent orchestrated four devices and three tools because the skill told it how.

Common mistakes

Writing a skill that says "always do X." Absolute rules confuse the model when X doesn't apply. Prefer "when the user asks for Y, do X" — conditional guidance.
Forgetting that workspace skills are active immediately. There's no install step. Drop a SKILL.md in ~/.openclaw/workspace/skills/<name>/ and the next agent turn sees it. (Also easy to forget: deleting the file removes the skill just as instantly.)
Putting secrets in SKILL.md. Skill text is injected into the model context every turn — it's not secret. Store API keys in openclaw.json, not in skill markdown.
Expecting macOS node capabilities to work without /elevated on. system.run and similar require explicit elevation per session. Without it, the node refuses — by design.
Installing every ClawHub skill you see. Each one takes context budget every turn. Keep only what you use; prune quarterly.

Phase 5

Tailscale + Security

Default-safe. Opt-in to exposure. Every increase in access requires explicit configuration.

pnpm phase:5

Why this exists

Your AI assistant knows everything about you — treat it like root access.

Your agent reads calendar events, drafts emails in your voice, can run shell commands, and holds session histories full of private detail. If a stranger grabs a connection to your Gateway, they aren't just reading chats — they can impersonate you through every connected channel.

OpenClaw's security isn't a single wall; it's a stack of defaults that keep exposure low unless you consciously opt in. Every layer starts closed. Every broader access mode requires explicit config. The goal: you can't accidentally leak yourself by skipping a setting — you'd have to choose to lower each shield in turn.

Tailscale + Gateway

The Gateway runs on loopback by design — only your machine can reach it. Tailscale provides secure remote access without opening ports.

In plain terms

Loopback means "this computer only." It's the opposite of 0.0.0.0, which means "anyone on any network I'm connected to." Loopback is a locked bedroom door; 0.0.0.0 is a broadcast tower.

Tailscale is a private network (VPN) that only your devices can join. It uses WireGuard (an encrypted tunnel) to make your laptop, phone, and work computer behave as if they're on the same LAN — even from different continents. Your identity is the key, not a password.

Tailnet = your personal Tailscale network. Devices you've added. No one else.

Funnel is Tailscale's optional "expose to the public internet" mode. It gives you a public HTTPS URL backed by Tailscale — useful for demos and webhooks, but no longer private. OpenClaw refuses to start Funnel without a password.

Sandbox = a walled-off execution environment (Docker container) where a process can read/write inside a contained filesystem but can't reach your main system.

Access Modes

Local Only

Default. gateway.bind: "loopback". Only processes on this machine can connect. No auth needed.

Tailscale Serve

Tailnet-only access. Tailscale identity = authentication. No password needed. Your devices on your tailnet.

Tailscale Funnel

Public internet access. Requires gateway.auth.mode: "password". OpenClaw refuses to start without it.

Serve Config

{ "gateway": { "bind": "loopback", "tailscale": { "mode": "serve", "resetOnExit": false } } }

Funnel Config

{ "gateway": { "bind": "loopback", "auth": { "mode": "password", "password": "your-strong-password-here" }, "tailscale": { "mode": "funnel" } } }

Network Topology

Three Access Modes LOCAL TAILSCALE SERVE TAILSCALE FUNNEL 127.0.0.1 tailnet only public internet │ │ │ │ WireGuard tunnel HTTPS → Funnel │ │ │ │ │ Password check │ │ ┌───┴───┐ │ │ pass fail → 401 │ │ │ └───────────────┴───────────────────┘ │ ▼ GATEWAY :18789 │ ┌───────┴───────┐ │ │ main session group sessions full access Docker sandbox

Security Model — Six Layers

Layer 1 — Gateway BindingLoopback only. Never 0.0.0.0.

Layer 2 — DM PairingUnknown senders get pairing code. Bot ignores until approved.

Layer 3 — Session SandboxingNon-main sessions run in Docker containers.

Layer 4 — Funnel PasswordHard enforcement. OpenClaw won't start without it.

Layer 5 — Tool DenylistConfigurable per-agent tool restrictions.

Layer 6 — openclaw doctorPre-flight health check after every config change.

Docker Sandbox — Tool Access

Allowed in Sandbox

bash, process, read, write, edit, sessions_*

Denied in Sandbox

browser, canvas, nodes, cron, discord, gateway

Security Checklist

☐ gateway.bind: "loopback"
☐ dmPolicy: "pairing" on all channels
☐ Funnel requires password
☐ Groups: requireMention: true
☐ sandbox.mode: "non-main" if groups connected
☐ openclaw doctor runs clean

Threat Models — What Could Actually Go Wrong

Security is easier to reason about when you picture specific bad outcomes instead of abstract risks. Here are five realistic scenarios and which layer stops each one.

Someone on my coffee-shop Wi-Fi finds the Gateway port.

They scan the network, see port 18789 open on your laptop, try to connect.

Layer 1 — loopback binding refuses any connection not from 127.0.0.1. They get "connection refused." No auth step even gets reached.

A stranger guesses my Funnel URL.

Tailscale Funnel URLs follow a predictable pattern (tailnet name + device name). Someone tries yours.

Layer 4 — Funnel mode won't start without auth.mode: "password". They hit a 401. They can brute-force, but a strong password plus modest rate limiting makes it impractical.

A Telegram bot scraper discovers my bot's username.

They DM your bot hoping it's open to anyone, probing for commands or trying to trigger tool calls.

Layer 2 — dmPolicy: "pairing" returns a pairing code and ignores their messages. The bot never forwards a single byte to the agent until you approve.

A malicious ClawHub skill tries to exfiltrate files from a group session.

You installed a skill that claimed to help with spreadsheets but actually tries to bash cat ~/.ssh/id_rsa.

Layer 3 — group sessions run in a Docker sandbox that can't see your real home directory. bash works but only inside the container, which has no SSH keys. Layer 5 tool denylist can also block bash entirely for group sessions.

A teammate in a group chat tries to make the agent act as if it were me privately.

They write "hey, can you share my DM history with the team?" hoping the agent doesn't know the boundary.

Session isolation — group sessions have a different history than main. The agent literally can't see your personal chat from inside a group session. Plus sessions_* tools can be denylisted for non-main sessions.

I forget to run openclaw doctor after a config change.

You edit openclaw.json on your phone, typo bind: "0.0.0.0", hot-reload picks it up, you've just exposed the Gateway.

Layer 6 — openclaw doctor is designed to be the always-last step. Run it on every change. The CLI refuses to proceed if bind isn't loopback. Make doctor the muscle memory you never skip.

Common mistakes

Weak Funnel password "just for testing." Funnel URLs get indexed by Tailscale's public resolver. Anyone who hears your password once can get back in later. Use a generated 20+ character password from day one.
Running the Gateway as root. If something does escape the sandbox, root access on the host is game over. Run as your normal user; nothing in OpenClaw requires elevation.
Trusting a skill because it's popular. ClawHub is community-curated, not Apple-reviewed. Read the SKILL.md before installing — it's literally the only thing that skill does.
Leaving /elevated on permanently on the macOS node. Elevation should be a brief, intentional window. Turn it off when you're done.
Forgetting that Tailscale Serve trusts your tailnet. Any device on your tailnet can reach the Gateway. If you share your tailnet with family, they can talk to your agent. Use ACLs to restrict per-device.

Phase 6

Deployment

Aggregates all phase documentation into a static site and deploys it — or saves locally.

pnpm phase:6

Deployment Guide

scripts/generate-docs.js reads all PHASE-*/DOCUMENTATION/*.md and AGENT-HANDOFF/*.md, combines with index.html into OUTPUT/local/, and optionally pushes to Cloudflare Pages.

Cloudflare Pages

# Set credentials → export CLOUDFLARE_API_TOKEN="your-token" → export CLOUDFLARE_ACCOUNT_ID="your-account-id" # Deploy → pnpm deploy ✓ Generated docs from 6 phases ✓ Deployed to openclaw-education.pages.dev

Local Fallback

→ pnpm docs ✓ Generated to OUTPUT/local/ → open OUTPUT/local/index.html

Alternative Deployment Paths

Remote Linux Gateway

npm install -g openclaw@latest then openclaw onboard --install-daemon. Runs as a systemd user service.

Docker Compose

git clone openclaw && docker compose up -d. Container-based deployment.

Nix

nix run github:openclaw/nix-openclaw. Declarative configuration.

What You Built

By completing all 6 phases, you now have:

Running Assistant

Gateway daemon running at ws://127.0.0.1:18789
Connected channels (Telegram, Slack, Discord, WhatsApp, etc.)
Pi agent responding with personalized AGENTS.md and SOUL.md
At least one workspace skill active

Secure Remote Access

Tailscale Serve configured for tailnet access
DM pairing enabled on all channels
Docker sandboxing for non-main sessions
openclaw doctor passing all checks

Documentation Site

Deployed to Cloudflare Pages or saved locally
Generated from your own phase outputs

Complete Mental Model

The Full Picture Channels (20+) │ Channel Adapters (normalize to OpenClaw struct) │ ▼ ┌──────────────────────────────────────┐ │ GATEWAY :18789 │ │ Sessions │ Tools │ Events │ Config │ └──────────────────┬───────────────────┘ │ ▼ ┌──────────────────────────────────────┐ │ PI AGENT (RPC) │ │ receive → think → tools → stream │ └──────────┬──────────┬───────────────┘ │ │ Skills (knows) Nodes (does) bundled macOS / iOS / Android ClawHub system.run / camera workspace Voice Wake / Canvas │ │ └────┬─────┘ │ Tailscale Serve / Funnel (secure remote access)

Where to Go Next

Official Docs

docs.openclaw.ai

OpenClaw Source

github.com/openclaw/openclaw

ClawHub

Community skills registry

Organized AI

github.com/Organized-AI

Reference

Glossary + FAQ

Every piece of jargon in OpenClaw, explained in plain English. Jump to a letter, or scroll. Click a term's code to see it in context elsewhere in the docs.

A B C D F G H L M N P R S T W

Adapter

A small piece of code that translates between one messaging service's native format (Telegram, Slack, Discord, etc.) and OpenClaw's shared internal format. One adapter per channel. Adapters run inside the Gateway process.

Agent loop

The 6 steps the Pi agent runs on every message: receive → load context → think → tool calls → stream reply → update session. This is the choreography that turns a "language model" into an "assistant that can do things."

AGENTS.md

A markdown file at ~/.openclaw/workspace/AGENTS.md holding your instructions to the agent. Applies universally regardless of channel. Think of it as an employee handbook the agent re-reads before every turn.

allowFrom

The per-channel list of senders the bot will actually talk to. Unknown senders get a pairing code instead of a response.

bind

The address a server listens on. loopback (127.0.0.1) means "this computer only." 0.0.0.0 means "every network interface" — dangerous for the Gateway.

BlueBubbles

Third-party software that runs on a Mac and exposes iMessage through an API. OpenClaw's iMessage adapter talks to BlueBubbles; BlueBubbles talks to Apple.

Canvas

OpenClaw's shared drawing/whiteboard surface. A bundled skill. The agent can write, render diagrams, and link to Canvas items via tools.

Channel (sometimes "surface")

A messaging service where the agent can send/receive (Telegram, Slack, Discord, WhatsApp, etc.). OpenClaw ships 20+ channel adapters.

ClawHub

The community registry of managed skills at clawhub.com. Think: an app store for agent skills. Skills are versioned and curated.

Context window

How much text a model can "see" at once, measured in tokens. Every message, instruction, and tool result takes up space. When it fills, history must be compacted or wiped.

Direct Message — a 1:1 conversation (as opposed to a group channel).

dmPolicy

Per-channel setting: "pairing" (default, safe) requires approval before a new sender gets replies; "open" talks to anyone.

Docker sandbox

A Docker container that isolates a group session's tool execution from your real filesystem. Tools like bash still work but only affect the container — not your laptop.

Doctor (openclaw doctor)

A CLI health-check that inspects your config, verifies bind is loopback, confirms Funnel has a password, tests channel connectivity, and flags anything risky. Run after every change.

Funnel (Tailscale Funnel)

Tailscale's "expose to the public internet" mode. Gives you a public HTTPS URL. OpenClaw refuses to start Funnel without a password — this is hard-enforced, not a suggestion.

Gateway

The WebSocket server at ws://127.0.0.1:18789. The router of OpenClaw — not the AI. It holds sessions, routes tool calls, streams events, hot-reloads config.

Group session

A session keyed to a group chat thread (one per group). Separate history from your main. Typically sandboxed and with reduced tool access.

groupActivation

How the agent decides to speak in a group: mention (only when @-mentioned), reply (when directly replied to), always (every message). mention is safest in busy rooms.

Hot-reload

The Gateway watches openclaw.json and re-applies most settings the moment you save — no restart. Some settings (port, bind) still require a restart.

Loopback

The network address 127.0.0.1 (also called localhost). Means "this machine only." The opposite of 0.0.0.0. OpenClaw always binds the Gateway here.

main (session)

Your personal 1:1 session with the agent. Full tool access. No sandboxing by default. One per Gateway instance.

Markdown

Plain text with lightweight formatting (# headings, * bullets). Skills, AGENTS.md and SOUL.md are all just markdown files.

Node (companion node)

A separate app on a device (macOS menu bar, iOS app, Android app) that pairs with the Gateway and exposes device-local capabilities. The agent talks to nodes via node.invoke.

node.invoke

The single tool the agent uses to call any node capability: node.invoke("camera.snap", { ... }). Routed by the Gateway to the registered device.

Normalize

To rewrite an incoming message into OpenClaw's standard shape ({ sender, thread, text, attachments, timestamp }) regardless of which channel it came from.

Pairing (DM pairing)

The default security flow for unknown senders: the bot replies with a pairing code and ignores further messages until you approve via openclaw pairing approve.

Pi agent

OpenClaw's agent runtime. Connects to the Gateway in RPC mode, runs the 6-step agent loop, streams replies. Channel-agnostic.

Port

A numbered "door" on a computer where a specific program listens. OpenClaw uses 18789. You can change it — but no two programs can share one.

RPC (Remote Procedure Call)

A pattern where one program calls a function that lives in another program, as if it were local. OpenClaw uses RPC over WebSocket between Gateway and Pi agent.

Sandbox

A contained execution environment (Docker) where risky tools like bash can run without touching your real filesystem. Non-main sessions are sandboxed by default.

Serve (Tailscale Serve)

Tailscale's "share within my tailnet" mode. Devices on your Tailscale network can reach the Gateway. No password needed — Tailscale identity is the auth.

Session

One conversation thread's state: model choice, thinking level, history, token count, cost. Persisted across Gateway restarts. main is your 1:1; groups get their own.

sessions_*

Tools the agent can use to coordinate across sessions: sessions_list, sessions_history, sessions_send. Enables multi-agent choreography.

Skill

A markdown file injected into the agent's context to teach a specific workflow. Three tiers: bundled (ships with OpenClaw), managed (ClawHub), workspace (your own).

Socket Mode

Slack's method for letting bots receive events over a persistent WebSocket (instead of public webhooks). Required for OpenClaw's Slack adapter.

SOUL.md

Markdown file defining the agent's personality, values, tone. Separate from AGENTS.md (which is task instructions) so you can swap one without disturbing the other.

Streaming

Sending a response piece-by-piece as it's generated, rather than waiting for the whole thing. Why replies appear word-by-word.

Tailnet

Your personal Tailscale network. The collection of devices you've added with your Tailscale identity.

Tailscale

A private-network (VPN) service built on WireGuard. Lets your devices talk securely without public IPs or open ports. OpenClaw uses it for remote access.

Thinking level

How much private reasoning the model does before replying: off / minimal / low / medium / high / xhigh. Higher = smarter + slower + more expensive.

Token

A chunk of text (~¾ of a word) — the unit models measure input and output in. Also the unit you're billed in.

Tool

A function the agent can call to affect the world: browser, Canvas, cron, node.invoke, sessions_*, file ops, shell, etc. The router in the Gateway dispatches each call.

TOOLS.md

Workspace file describing tool behaviors and usage guidance for your agent. The model reads it alongside AGENTS.md.

WebSocket

A persistent two-way connection between two programs. Unlike a normal web request (ask → answer → disconnect), a WebSocket stays open, so either side can send messages anytime — perfect for streaming tokens and real-time events.

WireGuard

A modern, fast, open-source VPN protocol. Tailscale is built on it.

Workspace

Your local folder at ~/.openclaw/workspace/ holding AGENTS.md, SOUL.md, TOOLS.md, and skills/<name>/SKILL.md. Everything that makes your agent yours.

Frequently Asked Questions

Is the Gateway sending my conversations to a cloud service?

The Gateway itself is 100% local. It runs on your machine and binds to loopback. However, the Pi agent calls an LLM provider (by default Anthropic's Claude) over HTTPS — so your prompts and replies travel to that provider for inference. If that bothers you, swap the model to a local one via agent.model (Ollama, LM Studio, etc.).

Can I use OpenClaw without any channels connected?

Yes. Run openclaw agent in a terminal or open the Control UI at http://127.0.0.1:18789. You'll get a WebChat session with no Telegram/Slack/Discord required. This is the recommended way to prototype skills before exposing them to real channels.

What's the difference between a skill and a tool?

Tools are functions the agent can call to affect the world — they're built into OpenClaw (browser, Canvas, cron, node.invoke, etc.). Skills are markdown text files that teach the agent when and how to use those tools for specific tasks. Tools = verbs; skills = playbooks.

Do I have to restart the Gateway after editing AGENTS.md or a skill?

No. Workspace skills and AGENTS.md are read fresh at the start of every agent turn. Save the file, send a new message, changes apply instantly. Same for SOUL.md and TOOLS.md. Only changes to openclaw.json at the port or bind level require a restart.

How much does running OpenClaw cost?

The software is free (MIT). The recurring cost is LLM inference — you pay the model provider (Anthropic, OpenAI, etc.) per token. A casual user might spend $5-$30/month; heavy users or always-on background agents run $50-$200/month. Run /usage in any session to see running totals. Use /compact and a lower thinkingLevel to control cost.

Can the agent read my files?

Only if you give it tools that can. The read/write/edit tools exist but are scoped to the current session's sandbox. The main session runs unsandboxed and can read your home directory — intentional, because you own main. Group sessions run sandboxed by default; they see a fake filesystem, not yours.

What happens if the Gateway crashes mid-conversation?

Sessions persist to disk. On restart, the Gateway reloads them and reconnects channel adapters. Messages that arrived during the outage are retrieved by the channel adapters on reconnect (Telegram, Slack and Discord all queue unread events for a while). You'll miss real-time streaming of an in-flight reply, but the conversation history stays intact.

Can two people share one OpenClaw instance?

Technically yes, via Tailscale Serve sharing the tailnet — but you'd share main, which is usually a bad idea. The cleaner pattern is: each person runs their own Gateway, but they can chat with each other's agents through group sessions in a shared Telegram/Discord/Slack channel. Agent-to-agent coordination via sessions_send works across that boundary.

How do I back up my agent?

Copy the ~/.openclaw/ directory. It contains your config (openclaw.json), workspace (AGENTS.md, SOUL.md, TOOLS.md, skills/), and session histories. Exclude the cache/ subfolder — it's regenerated automatically. Restoring is just dropping the folder back on a new machine and running pnpm install.

Is OpenClaw production-ready?

For a single user's personal assistant: yes. For customer-facing products or multi-tenant SaaS: not designed for that — no user isolation above the session layer, no billing, no SLA. Think of it as "your own Jarvis," not "a chatbot platform."

OpenClaw Architecture

Quick Start

Learning Modes

Docs Only

★ Diagrams + Docs

Full Hands-On

Key Concepts

01 — Gateway

02 — Channels

03 — Pi Agent

04 — Sessions

05 — Skills

06 — Nodes

Prerequisites

Full Architecture

The Gateway

Without a Gateway, every channel would have to talk directly to the AI.

What Is the Gateway?

What Connects to the Gateway

Channel Adapters

Pi Agent (RPC)

CLI Tools

Companion Nodes

Control UI + WebChat

What the Gateway Manages

A Telegram message's 7-second journey

Gateway Architecture

Config Reference — openclaw.json

Minimal Config

Full Config Anatomy

Critical Rules

Channels

Messaging apps are wildly different under the hood.

Channel Routing

Routing Path

Supported Channels

WhatsApp

Telegram

Slack

Discord

Signal

iMessage

A Slack thread's round trip

DM Pairing Policy

Channel Setup Guides

Telegram

Slack

Discord

WhatsApp

Verify

Agents + Sessions

An AI model by itself is amnesia plus zero hands.

Agent Architecture

The Agent Loop (6 Steps)

Agent Workspace

Session Model

Session Types

main

Group Sessions

Session Fields

Chat Commands

Agent-to-Agent Communication

What happens when you type /compact

Skills + Nodes

Knowing something and being able to do it are different problems.

Skills Platform

Three Skill Tiers

Bundled

Managed (ClawHub)

Workspace

Creating a Workspace Skill

Nodes

macOS Node

iOS Node

Android Node

Node Invocation Flow

A "Morning Briefing" skill in action (7:00 AM)

Tailscale + Security

Your AI assistant knows everything about you — treat it like root access.

Tailscale + Gateway

What happens when you type `/compact`