API Logging Schema: The Code We Copy Into Every Project
Author: WebGoodPeople
In our April 21 article we explained why a 5-field API log schema catches "silent" incidents. This one is practical. How to put that schema into an existing stack in 1–2 days of work, with no app rewrite and no downtime.
I'll show three code variants (Node.js/Express, Next.js API routes, Python/FastAPI), a Grafana Loki config, example alert rules, and a real cost calculation for your logs.
What exactly we log
A reminder of the target schema:
req_id UUID v4, generated on entry endpoint Logical identifier (/api/catalog/filter), no parameters status HTTP status (200, 404, 502) bytes Response body size in bytes latency_ms Request handling time in milliseconds data_version Version of the data the response was built from
Plus the standard fields: timestamp, user_id (if authenticated), method, query_params (separately, not in endpoint).
Node.js / Express middleware
The simplest variant. One middleware, plugged in at the start of the chain.
// middleware/request-logger.js
const { randomUUID } = require('crypto');
const pino = require('pino')();
function requestLogger(req, res, next) {
req.reqId = req.headers['x-request-id'] || randomUUID();
req.startTime = process.hrtime.bigint();
const originalSend = res.send;
let bytes = 0;
res.send = function (body) {
bytes = Buffer.byteLength(body || '', 'utf8');
return originalSend.call(this, body);
};
res.on('finish', () => {
const latencyMs = Number(process.hrtime.bigint() - req.startTime) / 1e6;
pino.info({
req_id: req.reqId,
endpoint: req.route?.path || req.path,
method: req.method,
status: res.statusCode,
bytes,
latency_ms: Math.round(latencyMs),
data_version: res.getHeader('x-data-version') || null,
user_id: req.user?.id || null,
});
});
next();
}
module.exports = requestLogger;
Plugging it into app.js:
const requestLogger = require('./middleware/request-logger');
app.use(requestLogger);
The data_version field is set in the handler via res.setHeader('x-data-version', indexVersion) — required on every endpoint that reads from ES, cache, or a materialized view.
Next.js API routes
For Next.js (App Router or Pages) we wrap the handler:
// lib/with-logging.ts
import { randomUUID } from 'crypto';
import pino from 'pino';
import { NextRequest, NextResponse } from 'next/server';
const log = pino();
export function withLogging(
handler: (req: NextRequest) => Promise<NextResponse>
) {
return async function wrapped(req: NextRequest) {
const reqId = req.headers.get('x-request-id') || randomUUID();
const start = process.hrtime.bigint();
const endpoint = new URL(req.url).pathname;
let response: NextResponse;
try {
response = await handler(req);
} catch (err) {
log.error({ req_id: reqId, endpoint, err: String(err) });
throw err;
}
const bytes = Number(response.headers.get('content-length') || 0);
const latencyMs = Number(process.hrtime.bigint() - start) / 1e6;
log.info({
req_id: reqId,
endpoint,
method: req.method,
status: response.status,
bytes,
latency_ms: Math.round(latencyMs),
data_version: response.headers.get('x-data-version'),
});
return response;
};
}Usage in a route handler:
// app/api/catalog/filter/route.ts
import { withLogging } from '@/lib/with-logging';
export const POST = withLogging(async (req) => {
const data = await searchCatalog(req);
return NextResponse.json(data, {
headers: { 'x-data-version': `index-v${CURRENT_INDEX_VERSION}` }
});
});Python / FastAPI middleware
# middleware.py
import time
import uuid
import json
import logging
from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware
log = logging.getLogger("api")
class RequestLogger(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
req_id = request.headers.get("x-request-id") or str(uuid.uuid4())
request.state.req_id = req_id
start = time.perf_counter()
response = await call_next(request)
body_iter = [chunk async for chunk in response.body_iterator]
body = b"".join(body_iter)
bytes_len = len(body)
log.info(json.dumps({
"req_id": req_id,
"endpoint": request.url.path,
"method": request.method,
"status": response.status_code,
"bytes": bytes_len,
"latency_ms": int((time.perf_counter() - start) * 1000),
"data_version": response.headers.get("x-data-version"),
}))
from starlette.responses import Response
return Response(
content=body, status_code=response.status_code,
headers=dict(response.headers), media_type=response.media_type,
)Shipping to Loki (Grafana Labs)
We recommend Promtail or Fluent Bit. A Promtail config:
# promtail.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: api-logs
static_configs:
- targets: [localhost]
labels:
job: api
env: production
__path__: /var/log/api/*.json
pipeline_stages:
- json:
expressions:
req_id: req_id
endpoint: endpoint
status: status
latency_ms: latency_ms
data_version: data_version
- labels:
endpoint:
status:
In Loki, every request gets full-text search plus a label index on endpoint and status.
Query examples in LogQL
Find all the "silent" empty 200s from the last hour:
{job="api", status="200"} | json | bytes < 200 and endpoint!~".*health.*"
Trace a single request across all services:
{env="production"} | json | req_id="a3f1-4c2b-..."
p95 latency per endpoint over 15 minutes:
quantile_over_time(0.95,
{job="api"} | json | unwrap latency_ms [15m]
) by (endpoint)
Alert rules
In Grafana Alerting or Alertmanager:
Empty-body alert (catches Black Friday):
- alert: EmptyBodyHighRate
expr: |
sum(rate({job="api", status="200"} |
json | bytes < 200 [5m])) by (endpoint)
> 0.2 * sum(rate({job="api", status="200"} [5m])) by (endpoint)
for: 2m
labels: { severity: critical }
annotations:
summary: "Endpoint {{ $labels.endpoint }} >20% empty 200 responses"
Result-count drop alert (derived from bytes):
- alert: ResponseBytesDropped
expr: |
avg_over_time({job="api"} | json | unwrap bytes [5m])
< 0.3 * avg_over_time({job="api"} | json | unwrap bytes [1h])
for: 3mCost of logs (an important calculation)
A typical e-commerce site: 500k sessions/mo × 20 API requests/session = 10M records/mo.
Average record: ~300 bytes (JSON with 8 fields) = 3 GB/mo.
Grafana Cloud Loki: ~$0.50 per GB/mo at 30-day retention. Total cost ~$1.5/mo.
Self-hosted Loki on a $40 VPS: 90-day retention and 15 GB fit with room to spare.
The budget is trivial. The one rule: don't log the full response body (it's usually 100× larger than the structured record).
What we don't recommend
Don't log full bodies. Log the size only. If you need detailed tracing of a specific request, use sampling (1 in 1000 records written with a body).
Don't log PII. No emails, phones, or addresses in logs. user_id is fine. Everything else only on an explicit business case signed off by security.
Don't use unstructured (free-text) logs. "ERROR: Something went wrong" is useless. Everything goes into structured JSON. Any text message belongs in a message field, not in the body of the line.
What to do after rollout
- Collect a baseline for three weeks: p95 latency and average
bytesfor each endpoint. - In week 3, set up alerts with thresholds based on the baseline.
- The first one or two incidents will show which fields you still need to add. Add them. Iterate.
- By month 2 you have a working observability platform that catches incidents before the client calls.
If you want it faster
A 48-hour audit: we look at your stack, hand you a concrete patch file for the middleware and ready-made Loki dashboards. Free, no strings attached.