AI Agent Safety Guide
Human-in-the-loop AI agents: a practical approval architecture
Human approval should not be a chat message bolted onto an autonomous workflow. It should be a durable state transition with clear policy, identity, timeouts, audit evidence, and a safe path for the agent to resume.
Which agent actions need human approval?
Approval should be driven by consequence, not by how impressive or uncertain the model sounds. A useful policy evaluates the action type, environment, data sensitivity, maximum impact, reversibility, and the agent's confidence.
Read-only development tasks may be safe to execute automatically. Production deletion, financial transactions, credential changes, public messages, and actions involving regulated data should usually stop for review. Critical actions may require two reviewers rather than one.
The seven parts of a durable handoff
- Policy evaluation. Evaluate structured facts about the proposed action before execution.
- Immutable request context. Record what the agent wants to do, why, and which resources are affected.
- Durable review ticket. Store a ticket with explicit pending, approved, rejected, expired, and cancelled states.
- Reviewer notification. Send a concise card or message with enough context to make a decision without exposing unnecessary sensitive data.
- Authenticated decision. Verify reviewer identity and protect callbacks against replay or tampering.
- Safe resumption. Resume only the exact approved action, under the approved constraints, and reject stale or conflicting decisions.
- Audit evidence. Retain request IDs, timestamps, policy results, state changes, reviewer identity, and final execution outcome.
Default-deny timeouts are safer
An unanswered approval is not approval. Set a deadline, expire the ticket, and default to denial when the deadline passes. The agent may create a new request with fresh context, but it should not silently reuse a stale approval.
Make retries idempotent
Agents and notification workers retry. Without idempotency, a network timeout can create duplicate tickets or execute an approved action twice. Assign a stable idempotency key to each proposed action and enforce uniqueness at the data layer.
Keep review context small and precise
A reviewer needs the proposed action, target, expected effect, reason, risk factors, and rollback plan. They do not need full conversation history, raw credentials, or unrelated customer data. Minimize notification payloads and keep sensitive detail inside the self-hosted review system.
Approval is not the final control
After approval, re-check preconditions. Confirm the resource still exists, the state has not changed, the approval has not expired, and the execution parameters still match the reviewed request. Use least-privilege credentials and record the actual outcome.
A minimal production checklist
- Explicit policy inputs and versioned rules
- Persistent tickets and append-only audit events
- Authenticated reviewers and signed callbacks
- Idempotency and optimistic conflict protection
- Default-deny expiration and cancellation
- Health checks, request IDs, metrics, and structured logs
- Backups, TLS, secret management, and access controls
Self-hosted implementation
Agent Handoff Production Kit
Get the source package, REST API, Python SDK, MCP server, Feishu integration, audit trail, Docker deployment, and security documentation as a one-time purchase.