Webhook Reliability Monitoring

Webhooks are the backbone of modern integrations. When your webhook endpoints go down, you miss critical events — failed payments, new orders, user actions. APIAssert helps ensure your webhooks are always ready to receive.

The Problem

Webhook failures are silent killers:

Missed events — Payment provider sends event, your endpoint is down
Processing failures — Endpoint returns 200 but doesn't process
Queue buildup — Events accumulate while you're unaware
Retry exhaustion — Providers give up after failed retries

Unlike APIs you call, webhooks are pushed to you. If your endpoint is down, you might not know until data is missing.

How APIAssert Helps

Monitor Endpoint Availability

Ensure your webhook endpoint is reachable:

Monitor: Stripe Webhook Endpoint
URL: POST /webhooks/stripe
Headers:
  Content-Type: application/json
Body: {"type": "ping"}
Assertions:
  ✓ Status code == 200 or 400
  ✓ Response time < 1000ms

Monitor Processing Health

Check that webhook processing is working:

Monitor: Webhook Queue Health
URL: GET /api/webhooks/health
Assertions:
  ✓ $.queue_size < 100
  ✓ $.processing == true
  ✓ $.last_processed within 5 minutes
  ✓ $.error_rate < 0.05

Monitor Multiple Endpoints

Different services, different endpoints:

Monitors:
  ✓ POST /webhooks/stripe
  ✓ POST /webhooks/github
  ✓ POST /webhooks/slack
  ✓ POST /webhooks/shopify

Common Webhook Sources

Payment Providers

Provider	Common Events
Stripe	payment_intent.succeeded, customer.subscription.updated
PayPal	PAYMENT.CAPTURE.COMPLETED, BILLING.SUBSCRIPTION.ACTIVATED
Square	payment.completed, refund.created

E-commerce

Platform	Common Events
Shopify	orders/create, products/update, inventory_levels/update
WooCommerce	order.created, product.updated
BigCommerce	store/order/created, store/product/updated

Developer Tools

Service	Common Events
GitHub	push, pull_request, issues
GitLab	Push Hook, Merge Request Hook
Jira	jira:issue_created, jira:issue_updated

Communication

Service	Common Events
Slack	message, app_mention, reaction_added
Twilio	message.received, call.completed
SendGrid	delivered, opened, bounced

Real-World Example

The Scenario

A marketplace app receives order webhooks from Shopify. When their webhook endpoint experiences intermittent 502 errors, orders aren't recorded in their system. Customers receive products but the marketplace has no record.

The APIAssert Solution

Monitor 1: Webhook Endpoint

URL: POST /webhooks/shopify
Headers: X-Shopify-Topic: orders/create
Body: {"test": true}
Assertions:
  ✓ Status code == 200
  ✓ Response time < 2000ms
Interval: 1 minute

Monitor 2: Order Processing Health

URL: GET /api/orders/sync-status
Assertions:
  ✓ $.pending_webhooks < 50
  ✓ $.last_processed within 5 minutes
  ✓ $.success_rate > 0.99
Interval: 2 minutes

The Outcome

Endpoint monitoring caught 502 errors within 1 minute
Alert triggered before significant order backlog
Root cause: memory leak in webhook handler
Fix deployed with zero missed orders

What to Assert

Response Codes

Your webhook endpoint should respond correctly:

✓ 200 — Event processed successfully
✓ 202 — Event accepted for processing
✓ 400 — Invalid payload (your code, not downtime)
✗ 500 — Server error (problem!)
✗ 502 — Gateway error (problem!)
✗ 503 — Service unavailable (problem!)

Response Time

Webhook providers have timeout limits:

Provider	Timeout
Stripe	20 seconds
GitHub	10 seconds
Shopify	5 seconds
Slack	3 seconds

Set assertions below these thresholds:

Assertion: Response time < 3000ms

Health Metrics

If you have a health endpoint:

// GET /api/webhooks/health
{
  "status": "healthy",
  "queue_size": 12,
  "processing_rate": 150,
  "error_rate": 0.001,
  "last_event": "2024-12-11T14:30:00Z"
}

Assertions:

$.status equals "healthy"
$.queue_size less than threshold
$.error_rate less than 0.05
$.last_event within expected timeframe

Best Practices

Respond Fast, Process Later

Your webhook endpoint should:

Validate the request (signature, structure)
Queue the event for processing
Return 200 immediately

This ensures you don't timeout while processing.

Monitor the Queue, Not Just the Endpoint

Endpoint might be up, but processing might be stuck:

Endpoint Up + Queue Growing = Problem
Endpoint Up + Queue Stable = Healthy

Use Signature Verification

Most providers sign webhooks. Verify in your handler:

// Stripe example
const event = stripe.webhooks.constructEvent(
  body,
  signature,
  webhookSecret
);

Implement Idempotency

Webhooks can be retried. Use event IDs to prevent duplicate processing:

if (await wasProcessed(event.id)) {
  return res.status(200).send('Already processed');
}

Log Everything

When debugging webhook issues, logs are essential:

console.log('Webhook received:', {
  type: event.type,
  id: event.id,
  timestamp: new Date().toISOString()
});

Alert Configuration

Critical (Immediate)

Condition: Endpoint returns 5xx
Action: Page on-call engineer
Reason: Events are being dropped

Warning (Slack)

Condition: Response time > 2000ms
Action: Notify #engineering
Reason: Risk of timeout, needs investigation

Monitoring (Email)

Condition: Queue size > threshold
Action: Email team
Reason: Processing may be falling behind

Getting Started

List your webhook endpoints — What services send you events?
Create monitors per endpoint — POST to each with test payload
Add health monitoring — If you have health endpoints
Set appropriate assertions — Response code + time
Configure alerts — Critical events to PagerDuty

Related Use Cases

Payment API Monitoring — Payment webhook monitoring
Third-party API Monitoring — Monitor APIs you depend on

Webhook Reliability Monitoring

Webhook Reliability Monitoring#

The Problem#

How APIAssert Helps#

Monitor Endpoint Availability#

Monitor Processing Health#

Monitor Multiple Endpoints#

Common Webhook Sources#

Payment Providers#

E-commerce#

Developer Tools#

Communication#

Real-World Example#

The Scenario#

The APIAssert Solution#

The Outcome#

What to Assert#

Response Codes#

Response Time#

Health Metrics#

Best Practices#

Respond Fast, Process Later#

Monitor the Queue, Not Just the Endpoint#

Use Signature Verification#

Implement Idempotency#

Log Everything#

Alert Configuration#

Critical (Immediate)#

Warning (Slack)#

Monitoring (Email)#

Getting Started#

Related Use Cases#

Related Features

Webhook Reliability Monitoring

The Problem

How APIAssert Helps

Monitor Endpoint Availability

Monitor Processing Health

Monitor Multiple Endpoints

Common Webhook Sources

Payment Providers

E-commerce

Developer Tools

Communication

Real-World Example

The Scenario

The APIAssert Solution

The Outcome

What to Assert

Response Codes

Response Time

Health Metrics

Best Practices

Respond Fast, Process Later

Monitor the Queue, Not Just the Endpoint

Use Signature Verification

Implement Idempotency

Log Everything

Alert Configuration

Critical (Immediate)

Warning (Slack)

Monitoring (Email)

Getting Started

Related Use Cases