AI Agent Safety Guide

Human-in-the-loop AI agents: a practical approval architecture

Human approval should not be a chat message bolted onto an autonomous workflow. It should be a durable state transition with clear policy, identity, timeouts, audit evidence, and a safe path for the agent to resume.

Start with the decision policy. Use the free AI Agent Approval Policy Generator to create a local JSON draft.

Which agent actions need human approval?

Approval should be driven by consequence, not by how impressive or uncertain the model sounds. A useful policy evaluates the action type, environment, data sensitivity, maximum impact, reversibility, and the agent's confidence.

Read-only development tasks may be safe to execute automatically. Production deletion, financial transactions, credential changes, public messages, and actions involving regulated data should usually stop for review. Critical actions may require two reviewers rather than one.

The seven parts of a durable handoff

  1. Policy evaluation. Evaluate structured facts about the proposed action before execution.
  2. Immutable request context. Record what the agent wants to do, why, and which resources are affected.
  3. Durable review ticket. Store a ticket with explicit pending, approved, rejected, expired, and cancelled states.
  4. Reviewer notification. Send a concise card or message with enough context to make a decision without exposing unnecessary sensitive data.
  5. Authenticated decision. Verify reviewer identity and protect callbacks against replay or tampering.
  6. Safe resumption. Resume only the exact approved action, under the approved constraints, and reject stale or conflicting decisions.
  7. Audit evidence. Retain request IDs, timestamps, policy results, state changes, reviewer identity, and final execution outcome.

Default-deny timeouts are safer

An unanswered approval is not approval. Set a deadline, expire the ticket, and default to denial when the deadline passes. The agent may create a new request with fresh context, but it should not silently reuse a stale approval.

Make retries idempotent

Agents and notification workers retry. Without idempotency, a network timeout can create duplicate tickets or execute an approved action twice. Assign a stable idempotency key to each proposed action and enforce uniqueness at the data layer.

Keep review context small and precise

A reviewer needs the proposed action, target, expected effect, reason, risk factors, and rollback plan. They do not need full conversation history, raw credentials, or unrelated customer data. Minimize notification payloads and keep sensitive detail inside the self-hosted review system.

Approval is not the final control

After approval, re-check preconditions. Confirm the resource still exists, the state has not changed, the approval has not expired, and the execution parameters still match the reviewed request. Use least-privilege credentials and record the actual outcome.

A minimal production checklist

Self-hosted implementation

Agent Handoff Production Kit

Get the source package, REST API, Python SDK, MCP server, Feishu integration, audit trail, Docker deployment, and security documentation as a one-time purchase.