Use Case

Webhook Reliability Monitoring

Never miss a webhook again

Webhook Reliability Monitoring

Webhooks are the backbone of modern integrations. When your webhook endpoints go down, you miss critical events — failed payments, new orders, user actions. APIAssert helps ensure your webhooks are always ready to receive.

The Problem

Webhook failures are silent killers:

  • Missed events — Payment provider sends event, your endpoint is down
  • Processing failures — Endpoint returns 200 but doesn't process
  • Queue buildup — Events accumulate while you're unaware
  • Retry exhaustion — Providers give up after failed retries

Unlike APIs you call, webhooks are pushed to you. If your endpoint is down, you might not know until data is missing.

How APIAssert Helps

Monitor Endpoint Availability

Ensure your webhook endpoint is reachable:

Monitor: Stripe Webhook Endpoint
URL: POST /webhooks/stripe
Headers:
  Content-Type: application/json
Body: {"type": "ping"}
Assertions:
  ✓ Status code == 200 or 400
  ✓ Response time < 1000ms

Monitor Processing Health

Check that webhook processing is working:

Monitor: Webhook Queue Health
URL: GET /api/webhooks/health
Assertions:
  ✓ $.queue_size < 100
  ✓ $.processing == true
  ✓ $.last_processed within 5 minutes
  ✓ $.error_rate < 0.05

Monitor Multiple Endpoints

Different services, different endpoints:

Monitors:
  ✓ POST /webhooks/stripe
  ✓ POST /webhooks/github
  ✓ POST /webhooks/slack
  ✓ POST /webhooks/shopify

Common Webhook Sources

Payment Providers

Provider Common Events
Stripe payment_intent.succeeded, customer.subscription.updated
PayPal PAYMENT.CAPTURE.COMPLETED, BILLING.SUBSCRIPTION.ACTIVATED
Square payment.completed, refund.created

E-commerce

Platform Common Events
Shopify orders/create, products/update, inventory_levels/update
WooCommerce order.created, product.updated
BigCommerce store/order/created, store/product/updated

Developer Tools

Service Common Events
GitHub push, pull_request, issues
GitLab Push Hook, Merge Request Hook
Jira jira:issue_created, jira:issue_updated

Communication

Service Common Events
Slack message, app_mention, reaction_added
Twilio message.received, call.completed
SendGrid delivered, opened, bounced

Real-World Example

The Scenario

A marketplace app receives order webhooks from Shopify. When their webhook endpoint experiences intermittent 502 errors, orders aren't recorded in their system. Customers receive products but the marketplace has no record.

The APIAssert Solution

Monitor 1: Webhook Endpoint

URL: POST /webhooks/shopify
Headers: X-Shopify-Topic: orders/create
Body: {"test": true}
Assertions:
  ✓ Status code == 200
  ✓ Response time < 2000ms
Interval: 1 minute

Monitor 2: Order Processing Health

URL: GET /api/orders/sync-status
Assertions:
  ✓ $.pending_webhooks < 50
  ✓ $.last_processed within 5 minutes
  ✓ $.success_rate > 0.99
Interval: 2 minutes

The Outcome

  • Endpoint monitoring caught 502 errors within 1 minute
  • Alert triggered before significant order backlog
  • Root cause: memory leak in webhook handler
  • Fix deployed with zero missed orders

What to Assert

Response Codes

Your webhook endpoint should respond correctly:

✓ 200 — Event processed successfully
✓ 202 — Event accepted for processing
✓ 400 — Invalid payload (your code, not downtime)
✗ 500 — Server error (problem!)
✗ 502 — Gateway error (problem!)
✗ 503 — Service unavailable (problem!)

Response Time

Webhook providers have timeout limits:

Provider Timeout
Stripe 20 seconds
GitHub 10 seconds
Shopify 5 seconds
Slack 3 seconds

Set assertions below these thresholds:

Assertion: Response time < 3000ms

Health Metrics

If you have a health endpoint:

// GET /api/webhooks/health
{
  "status": "healthy",
  "queue_size": 12,
  "processing_rate": 150,
  "error_rate": 0.001,
  "last_event": "2024-12-11T14:30:00Z"
}

Assertions:

  • $.status equals "healthy"
  • $.queue_size less than threshold
  • $.error_rate less than 0.05
  • $.last_event within expected timeframe

Best Practices

Respond Fast, Process Later

Your webhook endpoint should:

  1. Validate the request (signature, structure)
  2. Queue the event for processing
  3. Return 200 immediately

This ensures you don't timeout while processing.

Monitor the Queue, Not Just the Endpoint

Endpoint might be up, but processing might be stuck:

Endpoint Up + Queue Growing = Problem
Endpoint Up + Queue Stable = Healthy

Use Signature Verification

Most providers sign webhooks. Verify in your handler:

// Stripe example
const event = stripe.webhooks.constructEvent(
  body,
  signature,
  webhookSecret
);

Implement Idempotency

Webhooks can be retried. Use event IDs to prevent duplicate processing:

if (await wasProcessed(event.id)) {
  return res.status(200).send('Already processed');
}

Log Everything

When debugging webhook issues, logs are essential:

console.log('Webhook received:', {
  type: event.type,
  id: event.id,
  timestamp: new Date().toISOString()
});

Alert Configuration

Critical (Immediate)

Condition: Endpoint returns 5xx
Action: Page on-call engineer
Reason: Events are being dropped

Warning (Slack)

Condition: Response time > 2000ms
Action: Notify #engineering
Reason: Risk of timeout, needs investigation

Monitoring (Email)

Condition: Queue size > threshold
Action: Email team
Reason: Processing may be falling behind

Getting Started

  1. List your webhook endpoints — What services send you events?
  2. Create monitors per endpoint — POST to each with test payload
  3. Add health monitoring — If you have health endpoints
  4. Set appropriate assertions — Response code + time
  5. Configure alerts — Critical events to PagerDuty

Related Use Cases