Architecture & Deployment Topology

TameFlare consists of two main components: the control plane (Next.js) and the gateway (Go). This page covers how they interact, deployment topologies, and network requirements.

Components

┌─────────────────────────────────────────────────────────┐
│  Control Plane (Next.js, port 3000)                     │
│                                                         │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐              │
│  │ Dashboard │  │ API (/v1)│  │ Auth     │              │
│  │ (React)  │  │ (routes) │  │ (bcrypt) │              │
│  └──────────┘  └──────────┘  └──────────┘              │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐              │
│  │ Policy   │  │ Audit    │  │ Token    │              │
│  │ Engine   │  │ Ledger   │  │ Signing  │              │
│  └──────────┘  └──────────┘  └──────────┘              │
│                    │                                     │
│              ┌─────┴─────┐                               │
│              │ SQLite DB │                               │
│              │ (local.db)│                               │
│              └───────────┘                               │
└─────────────────────────────────────────────────────────┘
        ▲                          ▲
        │ GATEWAY_SERVICE_TOKEN    │ HTTP API
        ▼                          │
┌─────────────────────────────┐    │
│  Gateway (Go, port 9443)    │    │
│                             │    │
│  ┌──────────┐ ┌──────────┐ │    │
│  │ Proxy    │ │Connectors│ │    │
│  │ (HTTP/S) │ │ (5 built │ │    │
│  └──────────┘ │  -in)    │ │    │
│  ┌──────────┐ └──────────┘ │    │
│  │Credential│ ┌──────────┐ │    │
│  │ Vault    │ │Permission│ │    │
│  └──────────┘ │ Checker  │ │    │
│               └──────────┘ │    │
│  ┌───────────────────────┐ │    │
│  │ SQLite (gateway.db)   │ │    │
│  └───────────────────────┘ │    │
└─────────────────────────────┘    │
        ▲                          │
        │ HTTP_PROXY / HTTPS_PROXY │
        │                          │
┌───────┴──────────────────────────┴──┐
│  Agent Process                       │
│  (Python, Node.js, Go, shell, etc.) │
└──────────────────────────────────────┘

Deployment topologies

Single-node (recommended for most users)

All components on one machine. Simplest setup, lowest latency.

┌─────────────────────────────────┐
│  Your server / laptop           │
│                                 │
│  Control plane  ←→  Gateway     │
│  (port 3000)       (port 9443)  │
│       ↑                ↑        │
│       │                │        │
│    Browser          Agent(s)    │
└─────────────────────────────────┘

Pros: Zero network config, localhost communication, single backup target
Cons: Single point of failure, gateway and control plane share resources
Best for: Solo developers, small teams, CI/CD pipelines

Split deployment (control plane + gateway on separate hosts)

For teams that want the dashboard accessible from a different network than the agents.

┌──────────────────┐     HTTPS      ┌──────────────────┐
│  Dashboard host  │ ←────────────→ │  Agent host       │
│                  │                │                   │
│  Control plane   │                │  Gateway          │
│  (port 3000)     │                │  (port 9443)      │
│  + SQLite        │                │  + SQLite         │
│  + Turso (opt)   │                │  + Agents         │
└──────────────────┘                └──────────────────┘

Requires: GATEWAY_URL set to the gateway's HTTPS address, CONTROL_PLANE_URL set on the gateway, GATEWAY_SERVICE_TOKEN shared between both
Pros: Dashboard accessible from corporate network, agents isolated
Cons: Network latency for token verification, HTTPS required between components
Best for: Teams with network segmentation requirements

Docker Compose

# docker-compose.yml ships with TameFlare
services:
  web:    # Control plane on port 3000
  gateway: # Go proxy on port 9443

Both services communicate via Docker's internal network. No HTTPS needed between them.

Network requirements

Ports

| Port | Component | Direction | Purpose | |---|---|---|---| | 3000 | Control plane | Inbound | Dashboard UI + API | | 9443 | Gateway | Inbound (from agents) | Proxy endpoint | | 9443+ | Gateway | Inbound (from agents) | Per-agent proxy ports (allocated dynamically) |

Firewall rules

| From | To | Protocol | Required? | |---|---|---|---| | Browser | Control plane :3000 | HTTPS | Yes (dashboard) | | Agent | Gateway :9443 | HTTP | Yes (proxy) | | Gateway | Control plane :3000 | HTTP/HTTPS | Yes (token verification) | | Gateway | External APIs | HTTPS | Yes (forwarded requests) | | Control plane | Gateway :9443 | HTTP | Optional (status checks, approvals) |

DNS

The gateway resolves upstream API domains (e.g., api.github.com) at request time. Ensure the gateway host has DNS access to all domains your agents need to reach.

Blast radius

What happens if each component fails:

| Component failure | Impact | Agents affected? | Data loss? | |---|---|---|---| | Control plane down | Dashboard inaccessible, SDK mode fails, no new audit events | Proxy mode: agents continue (gateway has local permissions). SDK mode: agents blocked. | No — SQLite persists | | Gateway down | All proxied requests fail (connection refused) | Yes — all agents on this gateway | No — SQLite persists | | SQLite (control plane) corrupt | API errors, dashboard errors | SDK mode blocked. Proxy mode unaffected. | Audit log lost if no backup | | SQLite (gateway) corrupt | Traffic log lost, permissions may fail | Depends on cached state | Traffic log lost if no backup | | Network between components | Token verification fails | Proxy mode: gateway uses local permissions. SDK mode: blocked. | No |

Tip

The gateway is designed to operate independently when it cannot reach the control plane. Permission checks use the local SQLite database. Only ES256 token verification requires the control plane.

Scaling patterns

| Pattern | How | When | |---|---|---| | More agents per gateway | Each agent gets a port. ~50-100 agents per gateway. | Default — no config needed | | More throughput | Run multiple gateways on separate hosts, each with its own agents | >500 req/s per gateway | | Shared control plane | Use Turso (cloud SQLite) for the control plane DB, multiple gateways point to same CONTROL_PLANE_URL | Multi-host deployments | | High-availability dashboard | Reverse proxy (nginx) + multiple Next.js instances + Turso | Enterprise |

Component independence

The gateway and control plane are independent processes that can run separately:

| Scenario | Gateway | Control plane | What works | |---|---|---|---| | Both running | Online | Online | Everything — proxy, dashboard, SDK, audit | | Gateway only | Online | Offline | Proxy mode works (local permissions, credential vault). No dashboard, no SDK mode, no new audit events. | | Control plane only | Offline | Online | Dashboard, SDK mode, audit log. No proxy mode. | | Neither | Offline | Offline | Nothing — agents cannot reach any API |

Why this matters

Gateway crash doesn't affect the dashboard — users can still view audit logs, manage policies, and register agents
Control plane crash doesn't affect proxied agents — the gateway has its own SQLite with cached permissions and credentials
Network partition — if the gateway can't reach the control plane, it continues operating with local state. Token verification is skipped (only permission-based decisions apply).

This independence means you can restart the control plane (e.g., for upgrades) without interrupting running agents.

Gateway health monitoring

Health endpoint

curl http://localhost:9443/health
# {"status": "ok", "uptime_seconds": 3600, "agents": 3, "connectors": 5}

Traffic log discrepancy detection

Compare the gateway's traffic log against the control plane's audit log to detect silent drops:

# Count gateway traffic entries for today
sqlite3 .TameFlare/gateway.db "SELECT COUNT(*) FROM traffic_log WHERE created_at > date('now');"
 
# Count control plane action events for today
sqlite3 apps/web/local.db "SELECT COUNT(*) FROM audit_events WHERE event_type LIKE 'action.%' AND created_at > date('now');"

If the gateway count is significantly higher than the control plane count, some requests may not be reaching the audit log (e.g., network issues between components, or requests handled entirely by the gateway without control plane involvement).

Silent drop detection

The gateway logs every request it processes, including those it blocks. Check for requests that were received but have no corresponding traffic log entry:

# Requests with no decision logged (potential silent drops)
sqlite3 .TameFlare/gateway.db "SELECT COUNT(*) FROM traffic_log WHERE decision IS NULL AND created_at > date('now');"

A non-zero count indicates requests that were received but not fully processed — typically caused by gateway crashes during request handling.

Log rotation and disk management

Traffic log growth

| Usage | Daily growth | Monthly growth | |---|---|---| | Light (<100 req/day) | <1 MB | <10 MB | | Medium (1,000 req/day) | ~5-10 MB | ~150-300 MB | | Heavy (10,000+ req/day) | ~50-100 MB | ~1.5-3 GB |

Cleanup

Use the maintenance endpoint to prune old data:

curl -X POST \
  -H "Authorization: Bearer $MAINTENANCE_SECRET" \
  http://localhost:3000/api/maintenance/cleanup

This purges:

Audit events older than AUDIT_RETENTION_DAYS
Expired sessions
Used nonces older than 24 hours

The gateway traffic log is not automatically pruned by the maintenance job. To manage gateway disk usage:

# Check traffic log size
ls -lh .TameFlare/gateway.db
 
# Manual prune (keep last 30 days)
sqlite3 .TameFlare/gateway.db "DELETE FROM traffic_log WHERE created_at < date('now', '-30 days');"
sqlite3 .TameFlare/gateway.db "VACUUM;"

Disk monitoring recommendations

| Check | Threshold | Action | |---|---|---| | local.db size | > 500 MB | Run cleanup, reduce AUDIT_RETENTION_DAYS | | gateway.db size | > 1 GB | Prune traffic log, reduce retention | | Disk free space | < 10% | Immediate cleanup. See Failure Modes for disk-full behavior. |

HA and scaling limitations

Current limitations

| Feature | Status | Details | |---|---|---| | High availability | Not supported | Single-node SQLite. No automatic failover. | | Multi-node gateway | Not supported | Each gateway uses its own SQLite. No shared state between gateways. | | Multi-node control plane | Supported (with Turso) | Use Turso (cloud SQLite) for shared database across multiple Next.js instances. | | Load balancing (gateway) | Not recommended | Agents bind to a specific gateway port. Load balancing would break agent-port mapping. | | Load balancing (control plane) | Supported | Standard reverse proxy (nginx/Caddy) in front of multiple Next.js instances + Turso. |

Honest assessment

TameFlare is designed for single-node deployments with up to ~50-100 agents and ~500-1,000 req/s. For most teams, this is sufficient.

If you need:

HA for the dashboard — use Turso + multiple Next.js instances behind a reverse proxy
HA for the gateway — not currently possible. Run the gateway on a reliable host with process supervision (systemd, Docker restart policy)
Multi-region — not currently possible. Deploy one TameFlare instance per region, each with its own gateway and control plane

Planned improvements

Gateway state sync (share permissions across multiple gateways)
Turso-backed gateway traffic log (replace local SQLite)
Active-passive gateway failover

Next steps

Performance — latency and resource benchmarks
Deployment — production deployment guidance
Failure Modes — behavior under failure conditions
Audit Log — audit architecture and compliance