Monitoring AI agents in production

Traditional monitoring asks one question: is the server up? If the endpoint returns 200, everything is fine. AI agents break that assumption. The server can be perfectly healthy while the agent silently produces wrong outputs, skips steps, runs over budget, or stops working entirely — all without triggering a single alert.

Monitoring autonomous agents requires a different mental model. Here's what actually breaks and how to catch it.

What traditional monitoring misses

Uptime monitoring tells you the endpoint responded. It says nothing about what the agent did inside that response. An agent endpoint that returns {"status": "ok"} in 50ms might have skipped the entire task due to a context length limit, a rate limit on the model API, or a malformed tool call that silently failed.

The failure modes specific to AI agents in production:

Silent tool failures. A tool call returns an error that the model handles by continuing without it. The task "completes" but with missing data.
Context window exhaustion. Long-running agents hit token limits mid-task and truncate their work. The HTTP response is still 200.
Model API degradation. The underlying model API is slow or returning degraded outputs. Your endpoint is up; the work is wrong.
Drift over time. An agent that worked last week starts producing subtly different outputs as the model is updated. No alert fires — outputs just quietly change.
Scheduled run skips. The agent was supposed to run at 06:00. It didn't. Nothing in your existing monitoring catches this because the server never went down.

Tickstem covers all three monitoring layers — uptime, heartbeat, and execution history — under one API key. Free tier. Try it free →

The three layers of agent monitoring

Layer 1: Uptime monitoring

Still necessary — just not sufficient. Your agent's HTTP endpoint should be monitored for availability and response time. A degraded model API often manifests first as increased latency before it causes failures.

Set up an uptime monitor on the endpoint your agent exposes. A 30-second check interval catches most outages before users do. Configure timeout alerts — if your agent normally responds in under 10 seconds and starts taking 90, something is wrong even if it's still returning 200.

curl -X POST https://api.tickstem.dev/v1/monitors \
  -H "Authorization: Bearer $TICKSTEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "summary-agent-endpoint",
    "url": "https://your-app.com/agents/summary/health",
    "interval_secs": 30,
    "timeout_secs": 15
  }'

Layer 2: Heartbeat monitoring

Uptime tells you the server is alive. Heartbeat tells you the agent actually did the work.

A heartbeat monitor works as a dead man's switch: your agent sends a ping after each successful completion. If the ping stops arriving within the expected window, you get an alert. The server being up is irrelevant — if the work stopped happening, the heartbeat catches it.

# Create a heartbeat — save the token
curl -X POST https://api.tickstem.dev/v1/heartbeats \
  -H "Authorization: Bearer $TICKSTEM_API_KEY" \
  -d '{"name":"daily-summary-agent","interval_secs":86400,"grace_secs":3600}'

# At the end of every successful agent run
curl -s -X POST https://api.tickstem.dev/v1/heartbeats/$HEARTBEAT_TOKEN/ping

The ping only fires on success — after the agent has verified its own output, written to the database, sent the downstream message, whatever the task requires. Silence means failure, regardless of what the HTTP response said.

Layer 3: Execution history

The most underused layer. Every scheduled agent run should produce a logged record: when it ran, how long it took, whether it succeeded, and what it returned.

Without this, debugging a failure means reconstructing what happened from scattered logs across your application, the model API, and any tools the agent called. With it, you open the execution history and see immediately: the run at 06:03 took 4 minutes instead of the usual 45 seconds, returned a 500, and the response body contains a rate limit error from the model API.

If you're using HTTP-based scheduling for your agent, execution history comes for free — every run is logged with the full request and response. If you're using platform-native cron, you need to build this yourself or pull it from your application logs.

A practical rule: any agent task that runs on a schedule and produces output that other systems depend on needs all three layers. Uptime alone is not monitoring — it's a pulse check.

Wiring it up via MCP

If you're building with Claude Code or a similar MCP-compatible agent, you can set up the full monitoring stack from within your editor. The Tickstem MCP server exposes create_monitor, create_heartbeat, and list_executions as native tools — the agent can register its own monitoring as part of the setup flow.

# tsk-mcp installed, API key configured
# Ask Claude Code to set up monitoring for your agent endpoint
"Create an uptime monitor for https://my-app.com/agents/health
 and a heartbeat for the daily-summary job with a 24h interval"

What good agent monitoring looks like

The goal is to answer three questions at any point in time, without digging through logs:

Is the agent endpoint reachable and responding normally? (uptime)
Did the agent complete its last scheduled task? (heartbeat)
What happened on the last N runs? (execution history)

When all three are in place, debugging shifts from "something might be wrong, let me check everything" to "here's exactly what happened and when."

Monitor your AI agents with Tickstem

Uptime monitoring, heartbeat checks, and execution history — one API key, free tier included.

Get started →

Related: Scheduling recurring tasks in AI agent workflows — how to run agents on a reliable schedule. Also: Heartbeat monitoring for AI agent pipelines · Production monitoring for Claude Code apps. See the uptime monitoring tool →