GET /v1/agents/{agent_id}/health
Check the health status of a deployed agent.
Request
GET https://api.run-agent.ai/v1/agents/{agent_id}/health
Authorization: Bearer YOUR_API_KEY
Path Parameters
The unique identifier of the agent
Response
Overall health status: healthy
, degraded
, or unhealthy
Individual health check results
External dependency status
Examples
Basic Health Check
curl https://api.run-agent.ai/v1/agents/agent-123/health \
-H "Authorization: Bearer YOUR_API_KEY"
Response Examples
Healthy Agent
{
"status": "healthy",
"checks": {
"agent": {
"status": "healthy",
"response_time_ms": 45
},
"dependencies": {
"openai_api": "healthy",
"database": "healthy"
},
"resources": {
"memory_usage_percent": 65,
"cpu_usage_percent": 20
}
},
"version": "1.2.3",
"uptime": 3600,
"last_request": "2024-01-01T12:00:00Z"
}
Degraded Agent
{
"status": "degraded",
"checks": {
"agent": {
"status": "healthy",
"response_time_ms": 150
},
"dependencies": {
"openai_api": "healthy",
"database": "slow"
},
"resources": {
"memory_usage_percent": 85,
"cpu_usage_percent": 75
}
},
"version": "1.2.3",
"uptime": 7200,
"warnings": ["High memory usage", "Database latency detected"]
}
Health Check Logic
Status is determined by:
- Healthy: All checks pass
- Degraded: Some checks show warnings but agent is functional
- Unhealthy: Critical checks fail
Monitoring Integration
Automated Monitoring
import time
def monitor_agent(agent_id, interval=60):
while True:
try:
response = requests.get(
f"https://api.run-agent.ai/v1/agents/{agent_id}/health",
headers={"Authorization": "Bearer YOUR_API_KEY"}
)
health = response.json()
if health['status'] != 'healthy':
send_alert(f"Agent {agent_id} is {health['status']}")
except Exception as e:
send_alert(f"Health check failed: {e}")
time.sleep(interval)
Prometheus Integration
# Expose metrics for Prometheus
from prometheus_client import Gauge
agent_health = Gauge('agent_health_status', 'Agent health status', ['agent_id'])
memory_usage = Gauge('agent_memory_usage', 'Memory usage percentage', ['agent_id'])
def update_metrics(agent_id):
health = get_agent_health(agent_id)
status_value = {'healthy': 1, 'degraded': 0.5, 'unhealthy': 0}
agent_health.labels(agent_id=agent_id).set(status_value[health['status']])
memory = health['checks']['resources']['memory_usage_percent']
memory_usage.labels(agent_id=agent_id).set(memory)
Best Practices
- Regular Monitoring: Check health every 30-60 seconds
- Set Alerts: Alert on status changes
- Track Trends: Monitor resource usage over time
- Implement Retries: Handle temporary network issues
See Also